yusyus
37cb307455
docs: update all documentation for 17 source types
...
Update 32 documentation files across English and Chinese (zh-CN) docs
to reflect the 10 new source types added in the previous commit.
Updated files:
- README.md, README.zh-CN.md — taglines, feature lists, examples, install extras
- docs/reference/ — CLI_REFERENCE, FEATURE_MATRIX, MCP_REFERENCE, CONFIG_FORMAT, API_REFERENCE
- docs/features/ — UNIFIED_SCRAPING with generic merge docs
- docs/advanced/ — multi-source guide, MCP server guide
- docs/getting-started/ — installation extras, quick-start examples
- docs/user-guide/ — core-concepts, scraping, packaging, workflows (complex-merge)
- docs/ — FAQ, TROUBLESHOOTING, BEST_PRACTICES, ARCHITECTURE, UNIFIED_PARSERS, README
- Root — BULLETPROOF_QUICKSTART, CONTRIBUTING, ROADMAP
- docs/zh-CN/ — Chinese translations for all of the above
32 files changed, +3,016 lines, -245 lines
2026-03-15 15:56:04 +03:00
yusyus
4c8e16c8b1
fix( #300 ): centralize selector fallback, fix dry-run link discovery, and smart --config routing
...
- Add FALLBACK_MAIN_SELECTORS constant and _find_main_content() helper to
eliminate 3 duplicated fallback loops in doc_scraper.py
- Move link extraction before early return in extract_content() so links
are always discovered from the full page, not just main content
- Fix single-threaded dry-run to extract links from soup (full page)
instead of main element only — fixes reactflow.dev finding only 1 page
- Add link extraction to async dry-run path (was completely missing)
- Remove main_content from get_configuration() defaults so fallback logic
kicks in instead of a broad CSS comma selector matching body
- Smart create --config routing: peek at JSON to determine unified
(sources array → unified_scraper) vs simple (base_url → doc_scraper)
- Update docs/user-guide/02-scraping.md and docs/reference/CONFIG_FORMAT.md
to use unified config format (legacy format rejected since v2.11.0)
- Fix test_auto_fetch_enabled and test_mcp_validate_legacy_config
Closes #300
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-26 22:25:59 +03:00
yusyus
73adda0b17
docs: update all chunk flag names to match renamed CLI flags
...
Replace all occurrences of old ambiguous flag names with the new explicit ones:
--chunk-size (tokens) → --chunk-tokens
--chunk-overlap → --chunk-overlap-tokens
--chunk → --chunk-for-rag
--streaming-chunk-size → --streaming-chunk-chars
--streaming-overlap → --streaming-overlap-chars
--chunk-size (pages) → --pdf-pages-per-chunk
Updated: CLI_REFERENCE (EN+ZH), user-guide (EN+ZH), integrations (Haystack,
Chroma, Weaviate, FAISS, Qdrant), features/PDF_CHUNKING, examples/haystack-pipeline,
strategy docs, archive docs, and CHANGELOG.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-02-24 22:15:14 +03:00
yusyus
ba9a8ff8b5
docs: complete documentation overhaul with v3.1.0 release notes and zh-CN translations
...
Documentation restructure:
- New docs/getting-started/ guide (4 files: install, quick-start, first-skill, next-steps)
- New docs/user-guide/ section (6 files: core concepts through troubleshooting)
- New docs/reference/ section (CLI_REFERENCE, CONFIG_FORMAT, ENVIRONMENT_VARIABLES, MCP_REFERENCE)
- New docs/advanced/ section (custom-workflows, mcp-server, multi-source)
- New docs/ARCHITECTURE.md - system architecture overview
- Archived legacy files (QUICKSTART.md, QUICK_REFERENCE.md, docs/guides/USAGE.md) to docs/archive/legacy/
Chinese (zh-CN) translations:
- Full zh-CN mirror of all user-facing docs (getting-started, user-guide, reference, advanced)
- GitHub Actions workflow for translation sync (.github/workflows/translate-docs.yml)
- Translation sync checker script (scripts/check_translation_sync.sh)
- Translation helper script (scripts/translate_doc.py)
Content updates:
- CHANGELOG.md: [Unreleased] → [3.1.0] - 2026-02-22
- README.md: updated with new doc structure links
- AGENTS.md: updated agent documentation
- docs/features/UNIFIED_SCRAPING.md: updated for unified scraper workflow JSON config
Analysis/planning artifacts (kept for reference):
- DOCUMENTATION_OVERHAUL_PLAN.md, DOCUMENTATION_OVERHAUL_SUMMARY.md
- FEATURE_GAP_ANALYSIS.md, IMPLEMENTATION_GAPS_ANALYSIS.md, CREATE_COMMAND_COVERAGE_ANALYSIS.md
- CHINESE_TRANSLATION_IMPLEMENTATION_SUMMARY.md, ISSUE_260_UPDATE.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-02-22 01:01:51 +03:00