docs: update all documentation for 17 source types

Update 32 documentation files across English and Chinese (zh-CN) docs to reflect the 10 new source types added in the previous commit. Updated files: - README.md, README.zh-CN.md — taglines, feature lists, examples, install extras - docs/reference/ — CLI_REFERENCE, FEATURE_MATRIX, MCP_REFERENCE, CONFIG_FORMAT, API_REFERENCE - docs/features/ — UNIFIED_SCRAPING with generic merge docs - docs/advanced/ — multi-source guide, MCP server guide - docs/getting-started/ — installation extras, quick-start examples - docs/user-guide/ — core-concepts, scraping, packaging, workflows (complex-merge) - docs/ — FAQ, TROUBLESHOOTING, BEST_PRACTICES, ARCHITECTURE, UNIFIED_PARSERS, README - Root — BULLETPROOF_QUICKSTART, CONTRIBUTING, ROADMAP - docs/zh-CN/ — Chinese translations for all of the above 32 files changed, +3,016 lines, -245 lines
2026-03-15 15:56:04 +03:00
parent 53b911b697
commit 37cb307455
32 changed files with 3011 additions and 240 deletions
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@@ -1,7 +1,7 @@
 # Frequently Asked Questions (FAQ)

-**Version:** 3.1.0-dev
-**Last Updated:** 2026-02-18
+**Version:** 3.2.0
+**Last Updated:** 2026-03-15

 ---

@@ -9,13 +9,17 @@

 ### What is Skill Seekers?

-Skill Seekers is a Python tool that converts documentation websites, GitHub repositories, and PDF files into AI-ready formats for 16+ platforms: LLM platforms (Claude, Gemini, OpenAI), RAG frameworks (LangChain, LlamaIndex, Haystack), vector databases (ChromaDB, FAISS, Weaviate, Qdrant, Pinecone), and AI coding assistants (Cursor, Windsurf, Cline, Continue.dev).
+Skill Seekers is a Python tool that converts 17 source types — documentation websites, GitHub repos, PDFs, videos, Word docs, EPUB books, Jupyter notebooks, local HTML files, OpenAPI specs, AsciiDoc, PowerPoint, RSS/Atom feeds, man pages, Confluence wikis, Notion pages, Slack/Discord exports, and local codebases — into AI-ready formats for 16+ platforms: LLM platforms (Claude, Gemini, OpenAI), RAG frameworks (LangChain, LlamaIndex, Haystack), vector databases (ChromaDB, FAISS, Weaviate, Qdrant, Pinecone), and AI coding assistants (Cursor, Windsurf, Cline, Continue.dev).

 **Use Cases:**
 - Create custom documentation skills for your favorite frameworks
 - Analyze GitHub repositories and extract code patterns
 - Convert PDF manuals into searchable AI skills
- Combine multiple sources (docs + code + PDFs) into unified skills
+- Import knowledge from Confluence, Notion, or Slack/Discord
+- Extract content from videos (YouTube, Vimeo, local files)
+- Convert Jupyter notebooks, EPUB books, or PowerPoint slides into skills
+- Parse OpenAPI/Swagger specs into API reference skills
+- Combine multiple sources (docs + code + PDFs + more) into unified skills

 ### Which platforms are supported?

@@ -77,12 +81,43 @@ The `--setup` command auto-detects your GPU vendor (NVIDIA CUDA, AMD ROCm, or CP
 - **AMD:** Uses `rocminfo` to find ROCm version → installs matching ROCm PyTorch
 - **CPU-only:** Installs lightweight CPU-only PyTorch

+### What source types are supported?
+
+Skill Seekers supports **17 source types**:
+
+| # | Source Type | CLI Command | Auto-Detection |
+|---|------------|-------------|----------------|
+| 1 | Documentation (web) | `scrape` / `create <url>` | HTTP/HTTPS URLs |
+| 2 | GitHub repo | `github` / `create owner/repo` | `owner/repo` or github.com URLs |
+| 3 | PDF | `pdf` / `create file.pdf` | `.pdf` extension |
+| 4 | Word (.docx) | `word` / `create file.docx` | `.docx` extension |
+| 5 | EPUB | `epub` / `create file.epub` | `.epub` extension |
+| 6 | Video | `video` / `create <url/file>` | YouTube/Vimeo URLs, video extensions |
+| 7 | Local codebase | `analyze` / `create ./path` | Directory paths |
+| 8 | Jupyter Notebook | `jupyter` / `create file.ipynb` | `.ipynb` extension |
+| 9 | Local HTML | `html` / `create file.html` | `.html`/`.htm` extensions |
+| 10 | OpenAPI/Swagger | `openapi` / `create spec.yaml` | `.yaml`/`.yml` with OpenAPI content |
+| 11 | AsciiDoc | `asciidoc` / `create file.adoc` | `.adoc`/`.asciidoc` extensions |
+| 12 | PowerPoint | `pptx` / `create file.pptx` | `.pptx` extension |
+| 13 | RSS/Atom | `rss` / `create feed.rss` | `.rss`/`.atom` extensions |
+| 14 | Man pages | `manpage` / `create cmd.1` | `.1`-`.8`/`.man` extensions |
+| 15 | Confluence | `confluence` | API or export directory |
+| 16 | Notion | `notion` | API or export directory |
+| 17 | Slack/Discord | `chat` | Export directory or API |
+
+The `create` command auto-detects the source type from your input, so you often don't need to specify a subcommand.
+
 ### How long does it take to create a skill?

 **Typical Times:**
 - Documentation scraping: 5-45 minutes (depends on size)
 - GitHub analysis: 1-5 minutes (basic) or 20-60 minutes (C3.x deep analysis)
 - PDF extraction: 30 seconds - 5 minutes
+- Video extraction: 2-10 minutes (depends on length and visual analysis)
+- Word/EPUB/PPTX: 10-60 seconds
+- Jupyter notebook: 10-30 seconds
+- OpenAPI spec: 5-15 seconds
+- Confluence/Notion import: 1-5 minutes (depends on space size)
 - AI enhancement: 30-60 seconds (LOCAL or API mode)
 - Total workflow: 10-60 minutes

@@ -214,6 +249,92 @@ skill-seekers pdf scanned.pdf --enable-ocr
 skill-seekers pdf document.pdf --extract-images --extract-tables
 ```

+### How do I scrape a Jupyter Notebook?
+
+```bash
+# Extract cells, outputs, and markdown from a notebook
+skill-seekers jupyter analysis.ipynb --name data-analysis
+
+# Or use auto-detection
+skill-seekers create analysis.ipynb
+```
+
+Jupyter extraction preserves code cells, markdown cells, and cell outputs. It works with `.ipynb` files from JupyterLab, Google Colab, and other notebook environments.
+
+### How do I import from Confluence or Notion?
+
+**Confluence:**
+```bash
+# From Confluence Cloud API
+export CONFLUENCE_URL=https://yourorg.atlassian.net
+export CONFLUENCE_TOKEN=your-api-token
+export CONFLUENCE_EMAIL=your-email@example.com
+skill-seekers confluence --space MYSPACE --name my-wiki
+
+# From a Confluence HTML/XML export directory
+skill-seekers confluence --export-dir ./confluence-export --name my-wiki
+```
+
+**Notion:**
+```bash
+# From Notion API
+export NOTION_TOKEN=secret_...
+skill-seekers notion --database DATABASE_ID --name my-notes
+
+# From a Notion HTML/Markdown export directory
+skill-seekers notion --export-dir ./notion-export --name my-notes
+```
+
+### How do I convert Word, EPUB, or PowerPoint files?
+
+```bash
+# Word document
+skill-seekers word report.docx --name quarterly-report
+
+# EPUB book
+skill-seekers epub handbook.epub --name dev-handbook
+
+# PowerPoint presentation
+skill-seekers pptx slides.pptx --name training-deck
+
+# Or use auto-detection for any of them
+skill-seekers create report.docx
+skill-seekers create handbook.epub
+skill-seekers create slides.pptx
+```
+
+### How do I parse an OpenAPI/Swagger spec?
+
+```bash
+# From a local YAML/JSON file
+skill-seekers openapi api-spec.yaml --name my-api
+
+# Auto-detection works too
+skill-seekers create api-spec.yaml
+```
+
+OpenAPI extraction parses endpoints, schemas, parameters, and examples into a structured API reference skill.
+
+### How do I extract content from RSS feeds or man pages?
+
+```bash
+# RSS/Atom feed
+skill-seekers rss https://blog.example.com/feed.xml --name blog-feed
+
+# Man page
+skill-seekers manpage grep.1 --name grep-manual
+```
+
+### How do I import from Slack or Discord?
+
+```bash
+# From a Slack export directory
+skill-seekers chat --platform slack --export-dir ./slack-export --name team-knowledge
+
+# From a Discord export directory
+skill-seekers chat --platform discord --export-dir ./discord-export --name server-archive
+```
+
 ### Can I combine multiple sources?

 Yes! Unified multi-source scraping:
@@ -704,6 +825,6 @@ Yes!

 ---

-**Version:** 3.1.0-dev
-**Last Updated:** 2026-02-18
+**Version:** 3.2.0
+**Last Updated:** 2026-03-15
 **Questions? Ask on [GitHub Discussions](https://github.com/yusufkaraaslan/Skill_Seekers/discussions)**