docs: update all chunk flag names to match renamed CLI flags

Replace all occurrences of old ambiguous flag names with the new explicit ones:
  --chunk-size (tokens)  → --chunk-tokens
  --chunk-overlap        → --chunk-overlap-tokens
  --chunk                → --chunk-for-rag
  --streaming-chunk-size → --streaming-chunk-chars
  --streaming-overlap    → --streaming-overlap-chars
  --chunk-size (pages)   → --pdf-pages-per-chunk

Updated: CLI_REFERENCE (EN+ZH), user-guide (EN+ZH), integrations (Haystack,
Chroma, Weaviate, FAISS, Qdrant), features/PDF_CHUNKING, examples/haystack-pipeline,
strategy docs, archive docs, and CHANGELOG.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
yusyus
2026-02-24 22:15:14 +03:00
parent 7a2ffb286c
commit 73adda0b17
29 changed files with 488 additions and 214 deletions

View File

@@ -223,7 +223,7 @@ skill-seekers package output/codebase --target langchain
**Option D: RAG-Optimized Chunking**
```bash
skill-seekers scrape --config configs/fastapi.json --chunk-for-rag --chunk-size 512
skill-seekers scrape --config configs/fastapi.json --chunk-for-rag --chunk-tokens 512
skill-seekers package output/fastapi --target langchain
```
@@ -968,7 +968,7 @@ collection.add(
2. **Implement Semantic Chunking:**
```bash
skill-seekers scrape --config configs/fastapi.json --chunk-for-rag --chunk-size 512
skill-seekers scrape --config configs/fastapi.json --chunk-for-rag --chunk-tokens 512
```
3. **Set Up Multi-Collection Search:**

View File

@@ -255,7 +255,7 @@ skill-seekers package output/codebase --target langchain
**Option D: RAG-Optimized Chunking**
```bash
skill-seekers scrape --config configs/fastapi.json --chunk-for-rag --chunk-size 512
skill-seekers scrape --config configs/fastapi.json --chunk-for-rag --chunk-tokens 512
skill-seekers package output/fastapi --target langchain
```

View File

@@ -318,8 +318,8 @@ print(response["llm"]["replies"][0])
# Enable semantic chunking (preserves code blocks, respects paragraphs)
skill-seekers scrape --config configs/django.json \
--chunk-for-rag \
--chunk-size 512 \
--chunk-overlap 50
--chunk-tokens 512 \
--chunk-overlap-tokens 50
# Package chunked output
skill-seekers package output/django --target haystack
@@ -439,8 +439,8 @@ python scripts/merge_documents.py \
# Enable chunking for frameworks with long pages
skill-seekers scrape --config configs/django.json \
--chunk-for-rag \
--chunk-size 512 \
--chunk-overlap 50
--chunk-tokens 512 \
--chunk-overlap-tokens 50
```
### 2. Choose Right Document Store
@@ -506,8 +506,8 @@ Complete example of building a FastAPI documentation chatbot:
# Scrape FastAPI docs with chunking
skill-seekers scrape --config configs/fastapi.json \
--chunk-for-rag \
--chunk-size 512 \
--chunk-overlap 50 \
--chunk-tokens 512 \
--chunk-overlap-tokens 50 \
--max-pages 200
# Package for Haystack
@@ -698,8 +698,8 @@ skill-seekers scrape --config configs/fastapi.json --chunk-for-rag
# 2. Adjust chunk size
skill-seekers scrape --config configs/fastapi.json \
--chunk-for-rag \
--chunk-size 768 \ # Larger chunks for more context
--chunk-overlap 100 # More overlap for continuity
--chunk-tokens 768 \ # Larger chunks for more context
--chunk-overlap-tokens 100 # More overlap for continuity
# 3. Use hybrid search (BM25 + embeddings)
# See Advanced Usage section

View File

@@ -270,7 +270,7 @@ skill-seekers package output/codebase --target langchain
**Option D: RAG-Optimized Chunking**
```bash
skill-seekers scrape --config configs/fastapi.json --chunk-for-rag --chunk-size 512
skill-seekers scrape --config configs/fastapi.json --chunk-for-rag --chunk-tokens 512
skill-seekers package output/fastapi --target langchain
```

View File

@@ -210,7 +210,7 @@ skill-seekers package output/codebase --target langchain
**Option D: RAG-Optimized Chunking**
```bash
skill-seekers scrape --config configs/fastapi.json --chunk-for-rag --chunk-size 512
skill-seekers scrape --config configs/fastapi.json --chunk-for-rag --chunk-tokens 512
skill-seekers package output/fastapi --target langchain
```
@@ -960,7 +960,7 @@ print(schema.get("multiTenancyConfig", {}).get("enabled")) # Should be True
2. **Implement Semantic Chunking:**
```bash
skill-seekers scrape --config configs/fastapi.json --chunk-for-rag --chunk-size 512
skill-seekers scrape --config configs/fastapi.json --chunk-for-rag --chunk-tokens 512
```
3. **Set Up Multi-Tenancy:**