refactor: rename all chunk flags to include explicit units
Replace ambiguous --chunk-size / --chunk-overlap names that meant different things in different contexts (tokens vs characters) with fully explicit names: - --chunk-size (RAG tokens) → --chunk-tokens - --chunk-overlap (RAG tokens) → --chunk-overlap-tokens - --chunk (enable RAG chunking) → --chunk-for-rag - --streaming-chunk-size (chars) → --streaming-chunk-chars - --streaming-overlap (chars) → --streaming-overlap-chars - --chunk-size (PDF pages) → --pdf-pages-per-chunk (poc file) Also aligns stream_parser.py help with streaming_ingest.py standalone parser. All 2167 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -380,9 +380,9 @@ def main():
|
||||
|
||||
parser = argparse.ArgumentParser(description="Stream and chunk skill documents")
|
||||
parser.add_argument("input", help="Input file or directory path")
|
||||
parser.add_argument("--chunk-size", type=int, default=4000, help="Chunk size in characters")
|
||||
parser.add_argument("--streaming-chunk-chars", type=int, default=4000, help="Chunk size in characters")
|
||||
parser.add_argument(
|
||||
"--chunk-overlap", type=int, default=200, help="Chunk overlap in characters"
|
||||
"--streaming-overlap-chars", type=int, default=200, help="Chunk overlap in characters"
|
||||
)
|
||||
parser.add_argument("--batch-size", type=int, default=100, help="Batch size for processing")
|
||||
parser.add_argument("--checkpoint", help="Checkpoint file path")
|
||||
@@ -390,7 +390,7 @@ def main():
|
||||
|
||||
# Initialize ingester
|
||||
ingester = StreamingIngester(
|
||||
chunk_size=args.chunk_size, chunk_overlap=args.chunk_overlap, batch_size=args.batch_size
|
||||
chunk_size=args.streaming_chunk_chars, chunk_overlap=args.streaming_overlap_chars, batch_size=args.batch_size
|
||||
)
|
||||
|
||||
# Progress callback
|
||||
|
||||
Reference in New Issue
Block a user