docs: update all documentation for 17 source types

Update 32 documentation files across English and Chinese (zh-CN) docs to reflect the 10 new source types added in the previous commit. Updated files: - README.md, README.zh-CN.md — taglines, feature lists, examples, install extras - docs/reference/ — CLI_REFERENCE, FEATURE_MATRIX, MCP_REFERENCE, CONFIG_FORMAT, API_REFERENCE - docs/features/ — UNIFIED_SCRAPING with generic merge docs - docs/advanced/ — multi-source guide, MCP server guide - docs/getting-started/ — installation extras, quick-start examples - docs/user-guide/ — core-concepts, scraping, packaging, workflows (complex-merge) - docs/ — FAQ, TROUBLESHOOTING, BEST_PRACTICES, ARCHITECTURE, UNIFIED_PARSERS, README - Root — BULLETPROOF_QUICKSTART, CONTRIBUTING, ROADMAP - docs/zh-CN/ — Chinese translations for all of the above 32 files changed, +3,016 lines, -245 lines
2026-03-15 15:56:04 +03:00
parent 53b911b697
commit 37cb307455
32 changed files with 3011 additions and 240 deletions
--- a/docs/BEST_PRACTICES.md
+++ b/docs/BEST_PRACTICES.md
@@ -434,6 +434,53 @@ That's it! Follow these practices and your skills will work better with Claude.

 ---

+## 8. Tips for Specific Source Types
+
+Skill Seekers supports **17 source types**. Here are tips for getting the best results from each category:
+
+### Documentation (Web)
+- Always test CSS selectors before large scrapes: `skill-seekers scrape --max-pages 3 --verbose`
+- Use `--async` for large sites (2-3x faster)
+
+### GitHub Repos
+- Use `--analysis-depth c3x` for deep analysis (patterns, tests, architecture)
+- Set `GITHUB_TOKEN` to avoid rate limits
+
+### PDFs & Office Documents (PDF, Word, EPUB, PPTX)
+- Use `--enable-ocr` for scanned PDFs
+- For Word/PPTX, embedded images are extracted automatically; add `--extract-images` for PDFs
+- EPUB works best with DRM-free files
+
+### Video
+- Run `skill-seekers video --setup` first to install GPU-optimized dependencies
+- YouTube and Vimeo URLs are auto-detected; local video files also work
+
+### Jupyter Notebooks
+- Ensure notebooks are saved (unsaved cell outputs won't be captured)
+- Both code cells and markdown cells are extracted
+
+### OpenAPI/Swagger Specs
+- Both YAML and JSON specs are supported (OpenAPI 3.x and Swagger 2.0)
+- Endpoints, schemas, and examples are parsed into structured API reference
+
+### AsciiDoc & Man Pages
+- AsciiDoc requires `asciidoctor` (install via your package manager or gem)
+- Man pages in sections `.1` through `.8` are supported
+
+### RSS/Atom Feeds
+- Useful for converting blog posts and changelogs into skills
+- Set `--max-items` to limit how many entries are extracted
+
+### Confluence & Notion
+- API mode requires authentication tokens (see FAQ for setup)
+- Export directory mode works offline with HTML/Markdown exports
+
+### Slack & Discord
+- Use official export tools (Slack Workspace Export, DiscordChatExporter)
+- Specify `--platform slack` or `--platform discord` explicitly
+
+---
+
 ## See Also

 - [Enhancement Guide](features/ENHANCEMENT.md) - AI-powered SKILL.md improvement