docs: update all documentation for 17 source types

Update 32 documentation files across English and Chinese (zh-CN) docs
to reflect the 10 new source types added in the previous commit.

Updated files:
- README.md, README.zh-CN.md — taglines, feature lists, examples, install extras
- docs/reference/ — CLI_REFERENCE, FEATURE_MATRIX, MCP_REFERENCE, CONFIG_FORMAT, API_REFERENCE
- docs/features/ — UNIFIED_SCRAPING with generic merge docs
- docs/advanced/ — multi-source guide, MCP server guide
- docs/getting-started/ — installation extras, quick-start examples
- docs/user-guide/ — core-concepts, scraping, packaging, workflows (complex-merge)
- docs/ — FAQ, TROUBLESHOOTING, BEST_PRACTICES, ARCHITECTURE, UNIFIED_PARSERS, README
- Root — BULLETPROOF_QUICKSTART, CONTRIBUTING, ROADMAP
- docs/zh-CN/ — Chinese translations for all of the above

32 files changed, +3,016 lines, -245 lines
This commit is contained in:
yusyus
2026-03-15 15:56:04 +03:00
parent 53b911b697
commit 37cb307455
32 changed files with 3011 additions and 240 deletions

View File

@@ -434,6 +434,53 @@ That's it! Follow these practices and your skills will work better with Claude.
---
## 8. Tips for Specific Source Types
Skill Seekers supports **17 source types**. Here are tips for getting the best results from each category:
### Documentation (Web)
- Always test CSS selectors before large scrapes: `skill-seekers scrape --max-pages 3 --verbose`
- Use `--async` for large sites (2-3x faster)
### GitHub Repos
- Use `--analysis-depth c3x` for deep analysis (patterns, tests, architecture)
- Set `GITHUB_TOKEN` to avoid rate limits
### PDFs & Office Documents (PDF, Word, EPUB, PPTX)
- Use `--enable-ocr` for scanned PDFs
- For Word/PPTX, embedded images are extracted automatically; add `--extract-images` for PDFs
- EPUB works best with DRM-free files
### Video
- Run `skill-seekers video --setup` first to install GPU-optimized dependencies
- YouTube and Vimeo URLs are auto-detected; local video files also work
### Jupyter Notebooks
- Ensure notebooks are saved (unsaved cell outputs won't be captured)
- Both code cells and markdown cells are extracted
### OpenAPI/Swagger Specs
- Both YAML and JSON specs are supported (OpenAPI 3.x and Swagger 2.0)
- Endpoints, schemas, and examples are parsed into structured API reference
### AsciiDoc & Man Pages
- AsciiDoc requires `asciidoctor` (install via your package manager or gem)
- Man pages in sections `.1` through `.8` are supported
### RSS/Atom Feeds
- Useful for converting blog posts and changelogs into skills
- Set `--max-items` to limit how many entries are extracted
### Confluence & Notion
- API mode requires authentication tokens (see FAQ for setup)
- Export directory mode works offline with HTML/Markdown exports
### Slack & Discord
- Use official export tools (Slack Workspace Export, DiscordChatExporter)
- Specify `--platform slack` or `--platform discord` explicitly
---
## See Also
- [Enhancement Guide](features/ENHANCEMENT.md) - AI-powered SKILL.md improvement