refactor: rename all chunk flags to include explicit units
Replace ambiguous --chunk-size / --chunk-overlap names that meant different things in different contexts (tokens vs characters) with fully explicit names: - --chunk-size (RAG tokens) → --chunk-tokens - --chunk-overlap (RAG tokens) → --chunk-overlap-tokens - --chunk (enable RAG chunking) → --chunk-for-rag - --streaming-chunk-size (chars) → --streaming-chunk-chars - --streaming-overlap (chars) → --streaming-overlap-chars - --chunk-size (PDF pages) → --pdf-pages-per-chunk (poc file) Also aligns stream_parser.py help with streaming_ingest.py standalone parser. All 2167 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -70,8 +70,8 @@ PACKAGE_ARGUMENTS: dict[str, dict[str, Any]] = {
|
||||
"help": "Use streaming ingestion for large docs (memory-efficient)",
|
||||
},
|
||||
},
|
||||
"streaming_chunk_size": {
|
||||
"flags": ("--streaming-chunk-size",),
|
||||
"streaming_chunk_chars": {
|
||||
"flags": ("--streaming-chunk-chars",),
|
||||
"kwargs": {
|
||||
"type": int,
|
||||
"default": 4000,
|
||||
@@ -79,8 +79,8 @@ PACKAGE_ARGUMENTS: dict[str, dict[str, Any]] = {
|
||||
"metavar": "N",
|
||||
},
|
||||
},
|
||||
"streaming_overlap": {
|
||||
"flags": ("--streaming-overlap",),
|
||||
"streaming_overlap_chars": {
|
||||
"flags": ("--streaming-overlap-chars",),
|
||||
"kwargs": {
|
||||
"type": int,
|
||||
"default": 200,
|
||||
@@ -98,8 +98,8 @@ PACKAGE_ARGUMENTS: dict[str, dict[str, Any]] = {
|
||||
},
|
||||
},
|
||||
# RAG chunking options
|
||||
"chunk": {
|
||||
"flags": ("--chunk",),
|
||||
"chunk_for_rag": {
|
||||
"flags": ("--chunk-for-rag",),
|
||||
"kwargs": {
|
||||
"action": "store_true",
|
||||
"help": "Enable intelligent chunking for RAG platforms (auto-enabled for RAG adaptors)",
|
||||
|
||||
Reference in New Issue
Block a user