docs: update all documentation for 17 source types

Update 32 documentation files across English and Chinese (zh-CN) docs to reflect the 10 new source types added in the previous commit. Updated files: - README.md, README.zh-CN.md — taglines, feature lists, examples, install extras - docs/reference/ — CLI_REFERENCE, FEATURE_MATRIX, MCP_REFERENCE, CONFIG_FORMAT, API_REFERENCE - docs/features/ — UNIFIED_SCRAPING with generic merge docs - docs/advanced/ — multi-source guide, MCP server guide - docs/getting-started/ — installation extras, quick-start examples - docs/user-guide/ — core-concepts, scraping, packaging, workflows (complex-merge) - docs/ — FAQ, TROUBLESHOOTING, BEST_PRACTICES, ARCHITECTURE, UNIFIED_PARSERS, README - Root — BULLETPROOF_QUICKSTART, CONTRIBUTING, ROADMAP - docs/zh-CN/ — Chinese translations for all of the above 32 files changed, +3,016 lines, -245 lines
2026-03-15 15:56:04 +03:00
parent 53b911b697
commit 37cb307455
32 changed files with 3011 additions and 240 deletions
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -441,21 +441,46 @@ def test_config_validation_with_missing_fields():

 ```
 Skill_Seekers/
-├── cli/                    # CLI tools
-│   ├── doc_scraper.py     # Main scraper
-│   ├── package_skill.py   # Packager
-│   ├── upload_skill.py    # Uploader
-│   └── utils.py           # Shared utilities
-├── mcp/                   # MCP server
-│   ├── server.py          # MCP implementation
-│   └── requirements.txt   # MCP dependencies
-├── configs/               # Framework configs
-├── docs/                  # Documentation
-├── tests/                 # Test suite
-└── .github/              # GitHub config
-    └── workflows/         # CI/CD workflows
+├── src/skill_seekers/      # Main package (src/ layout)
+│   ├── cli/                # CLI commands and entry points
+│   │   ├── main.py         # Unified CLI entry (COMMAND_MODULES dict)
+│   │   ├── source_detector.py  # Auto-detects source type
+│   │   ├── create_command.py   # Unified `create` command routing
+│   │   ├── config_validator.py # VALID_SOURCE_TYPES set
+│   │   ├── unified_scraper.py  # Multi-source orchestrator
+│   │   ├── unified_skill_builder.py # Pairwise synthesis + generic merge
+│   │   ├── doc_scraper.py      # Documentation (web)
+│   │   ├── github_scraper.py   # GitHub repos
+│   │   ├── pdf_scraper.py      # PDF files
+│   │   ├── word_scraper.py     # Word (.docx)
+│   │   ├── epub_scraper.py     # EPUB books
+│   │   ├── video_scraper.py    # Video (YouTube, Vimeo, local)
+│   │   ├── codebase_scraper.py # Local codebases
+│   │   ├── jupyter_scraper.py  # Jupyter Notebooks
+│   │   ├── html_scraper.py     # Local HTML files
+│   │   ├── openapi_scraper.py  # OpenAPI/Swagger specs
+│   │   ├── asciidoc_scraper.py # AsciiDoc files
+│   │   ├── pptx_scraper.py     # PowerPoint files
+│   │   ├── rss_scraper.py      # RSS/Atom feeds
+│   │   ├── manpage_scraper.py  # Man pages
+│   │   ├── confluence_scraper.py # Confluence wikis
+│   │   ├── notion_scraper.py   # Notion pages
+│   │   ├── chat_scraper.py     # Slack/Discord exports
+│   │   ├── adaptors/          # Platform adaptors (Strategy pattern)
+│   │   ├── arguments/         # CLI argument definitions (one per source)
+│   │   ├── parsers/           # Subcommand parsers (one per source)
+│   │   └── storage/           # Cloud storage adaptors
+│   ├── mcp/                # MCP server + tools
+│   └── sync/               # Sync monitoring
+├── configs/                # Preset JSON scraping configs
+├── docs/                   # Documentation
+├── tests/                  # 115+ test files (pytest)
+└── .github/               # GitHub config
+    └── workflows/          # CI/CD workflows
 ```

+**Scraper pattern (17 source types):** Each source type has `cli/<type>_scraper.py` (with `<Type>ToSkillConverter` class + `main()`), `arguments/<type>.py`, and `parsers/<type>_parser.py`. Register new types in: `parsers/__init__.py` PARSERS list, `main.py` COMMAND_MODULES dict, `config_validator.py` VALID_SOURCE_TYPES set.
+
 ---

 ## Release Process