refactor: rename all chunk flags to include explicit units

Replace ambiguous --chunk-size / --chunk-overlap names that meant different
things in different contexts (tokens vs characters) with fully explicit names:

- --chunk-size (RAG tokens)     → --chunk-tokens
- --chunk-overlap (RAG tokens)  → --chunk-overlap-tokens
- --chunk (enable RAG chunking) → --chunk-for-rag
- --streaming-chunk-size (chars) → --streaming-chunk-chars
- --streaming-overlap (chars)    → --streaming-overlap-chars
- --chunk-size (PDF pages)       → --pdf-pages-per-chunk (poc file)

Also aligns stream_parser.py help with streaming_ingest.py standalone parser.
All 2167 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
yusyus
2026-02-24 22:07:56 +03:00
parent b636a0a292
commit 7a2ffb286c
12 changed files with 40 additions and 40 deletions

View File

@@ -2227,8 +2227,8 @@ def execute_scraping_and_building(
from skill_seekers.cli.rag_chunker import RAGChunker
chunker = RAGChunker(
chunk_size=args.chunk_size,
chunk_overlap=args.chunk_overlap,
chunk_size=args.chunk_tokens,
chunk_overlap=args.chunk_overlap_tokens,
preserve_code_blocks=not args.no_preserve_code_blocks,
preserve_paragraphs=not args.no_preserve_paragraphs,
)