fix(cli): Phase 2.5 - Rename package streaming args for clarity

Problem:
- Same argument names in different commands with different meanings
- --chunk-size: 512 tokens (scrape/create) vs 4000 chars (package)
- --chunk-overlap: 50 tokens (scrape/create) vs 200 chars (package)
- Users expect consistent behavior, this was confusing

Solution:
Renamed package.py streaming arguments to be more specific:
- --chunk-size → --streaming-chunk-size (4000 chars)
- --chunk-overlap → --streaming-overlap (200 chars)

Result:
 Clear distinction: streaming args vs RAG args
 No naming conflicts across commands
 --chunk-size now consistently means "RAG tokens" everywhere
 All 9 package tests passing

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
yusyus
2026-02-15 14:52:31 +03:00
parent 13838cb5a9
commit 527ed65cc7

View File

@@ -252,17 +252,17 @@ Examples:
)
parser.add_argument(
"--chunk-size",
"--streaming-chunk-size",
type=int,
default=4000,
help="Maximum characters per chunk (streaming mode, default: 4000)",
help="Maximum characters per chunk (streaming mode only, default: 4000)",
)
parser.add_argument(
"--chunk-overlap",
"--streaming-overlap",
type=int,
default=200,
help="Overlap between chunks for context (streaming mode, default: 200)",
help="Character overlap between chunks (streaming mode only, default: 200)",
)
parser.add_argument(
@@ -300,8 +300,8 @@ Examples:
skip_quality_check=args.skip_quality_check,
target=args.target,
streaming=args.streaming,
chunk_size=args.chunk_size,
chunk_overlap=args.chunk_overlap,
chunk_size=args.streaming_chunk_size,
chunk_overlap=args.streaming_overlap,
batch_size=args.batch_size,
enable_chunking=args.chunk,
chunk_max_tokens=args.chunk_tokens,