docs: update all documentation for 17 source types

Update 32 documentation files across English and Chinese (zh-CN) docs to reflect the 10 new source types added in the previous commit. Updated files: - README.md, README.zh-CN.md — taglines, feature lists, examples, install extras - docs/reference/ — CLI_REFERENCE, FEATURE_MATRIX, MCP_REFERENCE, CONFIG_FORMAT, API_REFERENCE - docs/features/ — UNIFIED_SCRAPING with generic merge docs - docs/advanced/ — multi-source guide, MCP server guide - docs/getting-started/ — installation extras, quick-start examples - docs/user-guide/ — core-concepts, scraping, packaging, workflows (complex-merge) - docs/ — FAQ, TROUBLESHOOTING, BEST_PRACTICES, ARCHITECTURE, UNIFIED_PARSERS, README - Root — BULLETPROOF_QUICKSTART, CONTRIBUTING, ROADMAP - docs/zh-CN/ — Chinese translations for all of the above 32 files changed, +3,016 lines, -245 lines
2026-03-15 15:56:04 +03:00
parent 53b911b697
commit 37cb307455
32 changed files with 3011 additions and 240 deletions
--- a/docs/reference/API_REFERENCE.md
+++ b/docs/reference/API_REFERENCE.md
@@ -1,7 +1,7 @@
 # API Reference - Programmatic Usage

-**Version:** 3.1.0-dev
-**Last Updated:** 2026-02-18
+**Version:** 3.2.0
+**Last Updated:** 2026-03-15
 **Status:** ✅ Production Ready

 ---
@@ -217,7 +217,7 @@ skill_path = scrape_pdf(

 ### 4. Unified Multi-Source Scraping API

-Combine multiple sources (docs + GitHub + PDF) into a single unified skill.
+Combine multiple sources (any of 17 supported types) into a single unified skill.

 #### Unified Scraping

@@ -552,27 +552,47 @@ Skill Seekers uses JSON configuration files to define scraping behavior.

 ### Unified Config Schema (Multi-Source)

+Supports all 17 source types: `documentation`, `github`, `pdf`, `local`, `word`, `video`, `epub`, `jupyter`, `html`, `openapi`, `asciidoc`, `pptx`, `rss`, `manpage`, `confluence`, `notion`, `chat`.
+
 ```json
 {
  "name": "framework-unified",
  "description": "Complete framework documentation",
-  "sources": {
-    "documentation": {
-      "type": "docs",
+  "merge_mode": "rule-based",
+  "sources": [
+    {
+      "type": "documentation",
      "base_url": "https://docs.example.com/",
      "selectors": { "main_content": "article" }
    },
-    "github": {
+    {
      "type": "github",
-      "repo_url": "https://github.com/org/repo",
-      "analysis_depth": "c3x"
+      "repo": "org/repo",
+      "include_code": true,
+      "code_analysis_depth": "deep"
    },
-    "pdf": {
+    {
      "type": "pdf",
-      "pdf_path": "manual.pdf",
-      "enable_ocr": true
+      "path": "manual.pdf"
+    },
+    {
+      "type": "openapi",
+      "path": "specs/openapi.yaml"
+    },
+    {
+      "type": "video",
+      "url": "https://www.youtube.com/watch?v=example"
+    },
+    {
+      "type": "jupyter",
+      "path": "notebooks/examples.ipynb"
+    },
+    {
+      "type": "confluence",
+      "base_url": "https://company.atlassian.net/wiki",
+      "space_key": "DOCS"
    }
-  },
+  ],
  "conflict_resolution": "prefer_code",
  "merge_strategy": "smart"
 }
@@ -961,7 +981,21 @@ monitor_enhancement('output/react/', watch=True)
 | **Documentation Scraping** | `doc_scraper` | Extract from docs websites |
 | **GitHub Analysis** | `github_scraper` | Analyze code repositories |
 | **PDF Extraction** | `pdf_scraper` | Extract from PDF files |
-| **Unified Scraping** | `unified_scraper` | Multi-source scraping |
+| **Word Extraction** | `word_scraper` | Extract from .docx files |
+| **EPUB Extraction** | `epub_scraper` | Extract from .epub files |
+| **Video Transcription** | `video_scraper` | Extract from YouTube/Vimeo/local videos |
+| **Jupyter Extraction** | `jupyter_scraper` | Extract from .ipynb notebooks |
+| **HTML Extraction** | `html_scraper` | Extract from local HTML files |
+| **OpenAPI Parsing** | `openapi_scraper` | Parse OpenAPI/Swagger specs |
+| **AsciiDoc Extraction** | `asciidoc_scraper` | Extract from .adoc files |
+| **PowerPoint Extraction** | `pptx_scraper` | Extract from .pptx files |
+| **RSS/Atom Extraction** | `rss_scraper` | Extract from RSS/Atom feeds |
+| **Man Page Extraction** | `manpage_scraper` | Extract from Unix man pages |
+| **Confluence Extraction** | `confluence_scraper` | Extract from Confluence wikis |
+| **Notion Extraction** | `notion_scraper` | Extract from Notion workspaces |
+| **Chat Extraction** | `chat_scraper` | Extract from Slack/Discord exports |
+| **Local Codebase Analysis** | `codebase_scraper` | Analyze local directories |
+| **Unified Scraping** | `unified_scraper` | Multi-source scraping (17 types) |
 | **Skill Packaging** | `adaptors` | Package for LLM platforms |
 | **Skill Upload** | `adaptors` | Upload to platforms |
 | **AI Enhancement** | `adaptors` | Improve skill quality |
@@ -979,6 +1013,6 @@ monitor_enhancement('output/react/', watch=True)

 ---

-**Version:** 3.1.0-dev
-**Last Updated:** 2026-02-18
+**Version:** 3.2.0
+**Last Updated:** 2026-03-15
 **Status:** ✅ Production Ready
--- a/docs/reference/CLI_REFERENCE.md
+++ b/docs/reference/CLI_REFERENCE.md
@@ -1,8 +1,8 @@
 # CLI Reference - Skill Seekers

-> **Version:** 3.1.2
-> **Last Updated:** 2026-02-23
-> **Complete reference for all 20 CLI commands**
+> **Version:** 3.2.0
+> **Last Updated:** 2026-03-15
+> **Complete reference for all 30 CLI commands**

 ---

@@ -14,19 +14,29 @@
  - [Environment Variables](#environment-variables)
 - [Command Reference](#command-reference)
  - [analyze](#analyze) - Analyze local codebase
+  - [asciidoc](#asciidoc) - Extract from AsciiDoc files
+  - [chat](#chat) - Extract from Slack/Discord
  - [config](#config) - Configuration wizard
+  - [confluence](#confluence) - Extract from Confluence
  - [create](#create) - Create skill (auto-detects source)
  - [enhance](#enhance) - AI enhancement (local mode)
  - [enhance-status](#enhance-status) - Monitor enhancement
  - [estimate](#estimate) - Estimate page counts
  - [github](#github) - Scrape GitHub repository
+  - [html](#html) - Extract from local HTML files
  - [install](#install) - One-command complete workflow
  - [install-agent](#install-agent) - Install to AI agent
+  - [jupyter](#jupyter) - Extract from Jupyter notebooks
+  - [manpage](#manpage) - Extract from man pages
  - [multilang](#multilang) - Multi-language docs
+  - [notion](#notion) - Extract from Notion
+  - [openapi](#openapi) - Extract from OpenAPI/Swagger specs
  - [package](#package) - Package skill for platform
  - [pdf](#pdf) - Extract from PDF
+  - [pptx](#pptx) - Extract from PowerPoint files
  - [quality](#quality) - Quality scoring
  - [resume](#resume) - Resume interrupted jobs
+  - [rss](#rss) - Extract from RSS/Atom feeds
  - [scrape](#scrape) - Scrape documentation
  - [stream](#stream) - Stream large files
  - [unified](#unified) - Multi-source scraping
@@ -42,7 +52,7 @@

 ## Overview

-Skill Seekers provides a unified CLI for converting documentation, GitHub repositories, PDFs, and local codebases into AI-ready skills.
+Skill Seekers provides a unified CLI for converting documentation, GitHub repositories, PDFs, videos, notebooks, wikis, and 17 total source types into AI-ready skills for 16+ LLM platforms and RAG pipelines.

 ### Installation

@@ -172,6 +182,74 @@ skill-seekers analyze --directory ./my-project --skip-dependency-graph --skip-pa

 ---

+### asciidoc
+
+Extract content from AsciiDoc files and generate skill.
+
+**Purpose:** Convert `.adoc` / `.asciidoc` documentation into AI-ready skills.
+
+**Syntax:**
+```bash
+skill-seekers asciidoc [options]
+```
+
+**Key Flags:**
+
+| Flag | Description |
+|------|-------------|
+| `--asciidoc-path PATH` | Path to AsciiDoc file or directory |
+| `-n, --name` | Skill name |
+| `--from-json FILE` | Build from extracted JSON |
+| `--enhance-level` | AI enhancement (default: 0) |
+| `--dry-run` | Preview without executing |
+
+**Examples:**
+
+```bash
+# Single file
+skill-seekers asciidoc --asciidoc-path guide.adoc --name my-guide
+
+# Directory of AsciiDoc files
+skill-seekers asciidoc --asciidoc-path ./docs/ --name project-docs
+```
+
+---
+
+### chat
+
+Extract knowledge from Slack or Discord chat exports.
+
+**Purpose:** Convert chat history into searchable AI-ready skills.
+
+**Syntax:**
+```bash
+skill-seekers chat [options]
+```
+
+**Key Flags:**
+
+| Flag | Description |
+|------|-------------|
+| `--export-path PATH` | Path to chat export directory or file |
+| `--platform {slack,discord}` | Chat platform (default: slack) |
+| `--token TOKEN` | API token for authentication |
+| `--channel CHANNEL` | Channel name or ID to extract from |
+| `--max-messages N` | Max messages to extract (default: 10000) |
+| `-n, --name` | Skill name |
+| `--dry-run` | Preview without executing |
+
+**Examples:**
+
+```bash
+# From Slack export
+skill-seekers chat --export-path ./slack-export/ --name team-knowledge
+
+# From Discord via API
+skill-seekers chat --platform discord --token $DISCORD_TOKEN --channel general --name discord-docs
+```
+
+---
+
 ### config

 Interactive configuration wizard for API keys and settings.
@@ -210,6 +288,43 @@ skill-seekers config --test

 ---

+### confluence
+
+Extract content from Confluence wikis.
+
+**Purpose:** Convert Confluence spaces into AI-ready skills via API or HTML export.
+
+**Syntax:**
+```bash
+skill-seekers confluence [options]
+```
+
+**Key Flags:**
+
+| Flag | Description |
+|------|-------------|
+| `--base-url URL` | Confluence instance base URL |
+| `--space-key KEY` | Confluence space key |
+| `--export-path PATH` | Path to Confluence HTML/XML export directory |
+| `--username USER` | Confluence username |
+| `--token TOKEN` | Confluence API token |
+| `--max-pages N` | Max pages to extract (default: 500) |
+| `-n, --name` | Skill name |
+| `--dry-run` | Preview without executing |
+
+**Examples:**
+
+```bash
+# Via API
+skill-seekers confluence --base-url https://wiki.example.com --space-key DEV \
+  --username user@example.com --token $CONFLUENCE_TOKEN --name dev-wiki
+
+# From export
+skill-seekers confluence --export-path ./confluence-export/ --name team-docs
+```
+
+---
+
 ### create

 Create skill from any source. Auto-detects source type.
@@ -234,6 +349,15 @@ skill-seekers create [source] [options]
 | `owner/repo` | GitHub | `facebook/react` |
 | `./path` | Local codebase | `./my-project` |
 | `*.pdf` | PDF | `manual.pdf` |
+| `*.docx` | Word | `report.docx` |
+| `*.epub` | EPUB | `book.epub` |
+| `*.ipynb` | Jupyter Notebook | `analysis.ipynb` |
+| `*.html`/`*.htm` | Local HTML | `docs.html` |
+| `*.yaml`/`*.yml` | OpenAPI/Swagger | `openapi.yaml` |
+| `*.adoc`/`*.asciidoc` | AsciiDoc | `guide.adoc` |
+| `*.pptx` | PowerPoint | `slides.pptx` |
+| `*.rss`/`*.atom` | RSS/Atom feed | `feed.rss` |
+| `*.1`-`*.8`/`*.man` | Man page | `grep.1` |
 | `*.json` | Config file | `config.json` |

 **Flags:**
@@ -473,6 +597,39 @@ skill-seekers github --repo facebook/react --scrape-only

 ---

+### html
+
+Extract content from local HTML files and generate skill.
+
+**Purpose:** Convert local HTML documentation into AI-ready skills (for offline/exported docs).
+
+**Syntax:**
+```bash
+skill-seekers html [options]
+```
+
+**Key Flags:**
+
+| Flag | Description |
+|------|-------------|
+| `--html-path PATH` | Path to HTML file or directory |
+| `-n, --name` | Skill name |
+| `--from-json FILE` | Build from extracted JSON |
+| `--enhance-level` | AI enhancement (default: 0) |
+| `--dry-run` | Preview without executing |
+
+**Examples:**
+
+```bash
+# Single HTML file
+skill-seekers html --html-path docs/index.html --name my-docs
+
+# Directory of HTML files
+skill-seekers html --html-path ./html-export/ --name exported-docs
+```
+
+---
+
 ### install

 One-command complete workflow: fetch → scrape → enhance → package → upload.
@@ -558,6 +715,72 @@ skill-seekers install-agent output/react/ --agent cursor --force

 ---

+### jupyter
+
+Extract content from Jupyter Notebook files and generate skill.
+
+**Purpose:** Convert `.ipynb` notebooks into AI-ready skills with code, markdown, and outputs.
+
+**Syntax:**
+```bash
+skill-seekers jupyter [options]
+```
+
+**Key Flags:**
+
+| Flag | Description |
+|------|-------------|
+| `--notebook PATH` | Path to .ipynb file or directory |
+| `-n, --name` | Skill name |
+| `--from-json FILE` | Build from extracted JSON |
+| `--enhance-level` | AI enhancement (default: 0) |
+| `--dry-run` | Preview without executing |
+
+**Examples:**
+
+```bash
+# Single notebook
+skill-seekers jupyter --notebook analysis.ipynb --name data-analysis
+
+# Directory of notebooks
+skill-seekers jupyter --notebook ./notebooks/ --name ml-tutorials
+```
+
+---
+
+### manpage
+
+Extract content from Unix/Linux man pages and generate skill.
+
+**Purpose:** Convert man pages into AI-ready reference skills.
+
+**Syntax:**
+```bash
+skill-seekers manpage [options]
+```
+
+**Key Flags:**
+
+| Flag | Description |
+|------|-------------|
+| `--man-names NAMES` | Comma-separated man page names (e.g., `ls,grep,find`) |
+| `--man-path PATH` | Path to directory containing man page files |
+| `--sections SECTIONS` | Comma-separated section numbers (e.g., `1,3,8`) |
+| `-n, --name` | Skill name |
+| `--dry-run` | Preview without executing |
+
+**Examples:**
+
+```bash
+# By name (system man pages)
+skill-seekers manpage --man-names ls,grep,find,awk --name unix-essentials
+
+# From directory
+skill-seekers manpage --man-path /usr/share/man/man1/ --sections 1 --name section1-cmds
+```
+
+---
+
 ### multilang

 Multi-language documentation support.
@@ -590,6 +813,75 @@ skill-seekers multilang --config configs/docs.json --languages en,zh,es

 ---

+### notion
+
+Extract content from Notion workspaces.
+
+**Purpose:** Convert Notion pages and databases into AI-ready skills via API or export.
+
+**Syntax:**
+```bash
+skill-seekers notion [options]
+```
+
+**Key Flags:**
+
+| Flag | Description |
+|------|-------------|
+| `--database-id ID` | Notion database ID to extract from |
+| `--page-id ID` | Notion page ID to extract from |
+| `--export-path PATH` | Path to Notion export directory |
+| `--token TOKEN` | Notion integration token |
+| `--max-pages N` | Max pages to extract (default: 500) |
+| `-n, --name` | Skill name |
+| `--dry-run` | Preview without executing |
+
+**Examples:**
+
+```bash
+# Via API
+skill-seekers notion --database-id abc123 --token $NOTION_TOKEN --name team-docs
+
+# From export
+skill-seekers notion --export-path ./notion-export/ --name project-wiki
+```
+
+---
+
+### openapi
+
+Extract content from OpenAPI/Swagger specifications and generate skill.
+
+**Purpose:** Convert API specs into AI-ready reference skills with endpoint documentation.
+
+**Syntax:**
+```bash
+skill-seekers openapi [options]
+```
+
+**Key Flags:**
+
+| Flag | Description |
+|------|-------------|
+| `--spec PATH` | Path to OpenAPI/Swagger spec file |
+| `--spec-url URL` | URL to OpenAPI/Swagger spec |
+| `-n, --name` | Skill name |
+| `--from-json FILE` | Build from extracted JSON |
+| `--enhance-level` | AI enhancement (default: 0) |
+| `--dry-run` | Preview without executing |
+
+**Examples:**
+
+```bash
+# From local file
+skill-seekers openapi --spec api/openapi.yaml --name my-api
+
+# From URL
+skill-seekers openapi --spec-url https://petstore.swagger.io/v2/swagger.json --name petstore
+```
+
+---
+
 ### package

 Package skill directory into platform-specific format.
@@ -713,6 +1005,39 @@ skill-seekers pdf --pdf manual.pdf --name test --dry-run

 ---

+### pptx
+
+Extract content from PowerPoint files and generate skill.
+
+**Purpose:** Convert `.pptx` presentations into AI-ready skills.
+
+**Syntax:**
+```bash
+skill-seekers pptx [options]
+```
+
+**Key Flags:**
+
+| Flag | Description |
+|------|-------------|
+| `--pptx PATH` | Path to PowerPoint file (.pptx) |
+| `-n, --name` | Skill name |
+| `--from-json FILE` | Build from extracted JSON |
+| `--enhance-level` | AI enhancement (default: 0) |
+| `--dry-run` | Preview without executing |
+
+**Examples:**
+
+```bash
+# Extract from presentation
+skill-seekers pptx --pptx training-slides.pptx --name training-material
+
+# With enhancement
+skill-seekers pptx --pptx architecture.pptx --name arch-overview --enhance-level 2
+```
+
+---
+
 ### quality

 Analyze and score skill documentation quality.
@@ -791,6 +1116,41 @@ skill-seekers resume --clean

 ---

+### rss
+
+Extract content from RSS/Atom feeds and generate skill.
+
+**Purpose:** Convert blog feeds and news sources into AI-ready skills.
+
+**Syntax:**
+```bash
+skill-seekers rss [options]
+```
+
+**Key Flags:**
+
+| Flag | Description |
+|------|-------------|
+| `--feed-url URL` | URL of the RSS/Atom feed |
+| `--feed-path PATH` | Path to local RSS/Atom feed file |
+| `--follow-links` | Follow article links for full content (default: true) |
+| `--no-follow-links` | Use feed summary only |
+| `--max-articles N` | Max articles to extract (default: 50) |
+| `-n, --name` | Skill name |
+| `--dry-run` | Preview without executing |
+
+**Examples:**
+
+```bash
+# From URL
+skill-seekers rss --feed-url https://blog.example.com/feed.xml --name blog-knowledge
+
+# From local file, summaries only
+skill-seekers rss --feed-path ./feed.rss --no-follow-links --name feed-summaries
+```
+
+---
+
 ### scrape

 Scrape documentation website and generate skill.
--- a/docs/reference/CONFIG_FORMAT.md
+++ b/docs/reference/CONFIG_FORMAT.md
@@ -1,8 +1,8 @@
 # Config Format Reference - Skill Seekers

-> **Version:** 3.1.4
-> **Last Updated:** 2026-02-26
-> **Complete JSON configuration specification**
+> **Version:** 3.2.0
+> **Last Updated:** 2026-03-15
+> **Complete JSON configuration specification for 17 source types**

 ---

@@ -14,6 +14,7 @@
  - [GitHub Source](#github-source)
  - [PDF Source](#pdf-source)
  - [Local Source](#local-source)
+  - [Additional Source Types](#additional-source-types)
 - [Unified (Multi-Source) Config](#unified-multi-source-config)
 - [Common Fields](#common-fields)
 - [Selectors](#selectors)
@@ -266,6 +267,158 @@ For analyzing local codebases.

 ---

+### Additional Source Types
+
+The following 10 source types were added in v3.2.0. Each can be used as a standalone config or within a unified `sources` array.
+
+#### Jupyter Notebook Source
+
+```json
+{
+  "name": "ml-tutorial",
+  "sources": [{
+    "type": "jupyter",
+    "notebook_path": "notebooks/tutorial.ipynb"
+  }]
+}
+```
+
+#### Local HTML Source
+
+```json
+{
+  "name": "offline-docs",
+  "sources": [{
+    "type": "html",
+    "html_path": "./exported-docs/"
+  }]
+}
+```
+
+#### OpenAPI/Swagger Source
+
+```json
+{
+  "name": "petstore-api",
+  "sources": [{
+    "type": "openapi",
+    "spec_path": "api/openapi.yaml",
+    "spec_url": "https://petstore.swagger.io/v2/swagger.json"
+  }]
+}
+```
+
+#### AsciiDoc Source
+
+```json
+{
+  "name": "project-guide",
+  "sources": [{
+    "type": "asciidoc",
+    "asciidoc_path": "./docs/guide.adoc"
+  }]
+}
+```
+
+#### PowerPoint Source
+
+```json
+{
+  "name": "training-slides",
+  "sources": [{
+    "type": "pptx",
+    "pptx_path": "presentations/training.pptx"
+  }]
+}
+```
+
+#### RSS/Atom Feed Source
+
+```json
+{
+  "name": "engineering-blog",
+  "sources": [{
+    "type": "rss",
+    "feed_url": "https://engineering.example.com/feed.xml",
+    "follow_links": true,
+    "max_articles": 50
+  }]
+}
+```
+
+#### Man Page Source
+
+```json
+{
+  "name": "unix-tools",
+  "sources": [{
+    "type": "manpage",
+    "man_names": "ls,grep,find,awk,sed",
+    "sections": "1,3"
+  }]
+}
+```
+
+#### Confluence Source
+
+```json
+{
+  "name": "team-wiki",
+  "sources": [{
+    "type": "confluence",
+    "base_url": "https://wiki.example.com",
+    "space_key": "DEV",
+    "username": "user@example.com",
+    "max_pages": 500
+  }]
+}
+```
+
+#### Notion Source
+
+```json
+{
+  "name": "product-docs",
+  "sources": [{
+    "type": "notion",
+    "database_id": "abc123def456",
+    "max_pages": 500
+  }]
+}
+```
+
+#### Chat (Slack/Discord) Source
+
+```json
+{
+  "name": "team-knowledge",
+  "sources": [{
+    "type": "chat",
+    "export_path": "./slack-export/",
+    "platform": "slack",
+    "channel": "engineering",
+    "max_messages": 10000
+  }]
+}
+```
+
+#### Additional Source Fields Reference
+
+| Source Type | Required Fields | Optional Fields |
+|-------------|-----------------|-----------------|
+| `jupyter` | `notebook_path` | — |
+| `html` | `html_path` | — |
+| `openapi` | `spec_path` or `spec_url` | — |
+| `asciidoc` | `asciidoc_path` | — |
+| `pptx` | `pptx_path` | — |
+| `rss` | `feed_url` or `feed_path` | `follow_links`, `max_articles` |
+| `manpage` | `man_names` or `man_path` | `sections` |
+| `confluence` | `base_url` + `space_key` or `export_path` | `username`, `token`, `max_pages` |
+| `notion` | `database_id` or `page_id` or `export_path` | `token`, `max_pages` |
+| `chat` | `export_path` | `platform`, `token`, `channel`, `max_messages` |
+
+---
+
 ## Unified (Multi-Source) Config

 Combine multiple sources into one skill with conflict detection.
@@ -380,14 +533,27 @@ Unified configs support defining enhancement workflows at the top level:

 #### Source Types in Unified Config

-Each source in the `sources` array can be:
+Each source in the `sources` array can be any of the 17 supported types:

 | Type | Required Fields |
 |------|-----------------|
-| `docs` | `base_url` |
+| `documentation` / `docs` | `base_url` |
 | `github` | `repo` |
 | `pdf` | `pdf_path` |
+| `word` | `docx_path` |
+| `epub` | `epub_path` |
+| `video` | `url` or `video_path` |
 | `local` | `directory` |
+| `jupyter` | `notebook_path` |
+| `html` | `html_path` |
+| `openapi` | `spec_path` or `spec_url` |
+| `asciidoc` | `asciidoc_path` |
+| `pptx` | `pptx_path` |
+| `rss` | `feed_url` or `feed_path` |
+| `manpage` | `man_names` or `man_path` |
+| `confluence` | `base_url` + `space_key` or `export_path` |
+| `notion` | `database_id` or `page_id` or `export_path` |
+| `chat` | `export_path` |

 ---

@@ -606,6 +772,44 @@ Control which URLs are included or excluded:
 }
 ```

+### Unified with New Source Types
+
+```json
+{
+  "name": "project-complete",
+  "description": "Full project knowledge from multiple source types",
+  "merge_mode": "claude-enhanced",
+  "sources": [
+    {
+      "type": "docs",
+      "name": "project-docs",
+      "base_url": "https://docs.example.com/",
+      "max_pages": 200
+    },
+    {
+      "type": "github",
+      "name": "project-code",
+      "repo": "example/project"
+    },
+    {
+      "type": "openapi",
+      "name": "project-api",
+      "spec_path": "api/openapi.yaml"
+    },
+    {
+      "type": "confluence",
+      "name": "project-wiki",
+      "export_path": "./confluence-export/"
+    },
+    {
+      "type": "jupyter",
+      "name": "project-notebooks",
+      "notebook_path": "./notebooks/"
+    }
+  ]
+}
+```
+
 ### Local Project

 ```json
--- a/docs/reference/FEATURE_MATRIX.md
+++ b/docs/reference/FEATURE_MATRIX.md
@@ -13,28 +13,55 @@ Complete feature support across all platforms and skill modes.

 ## Skill Mode Support

-| Mode | Description | Platforms | Example Configs |
-|------|-------------|-----------|-----------------|
-| **Documentation** | Scrape HTML docs | All 4 | react.json, django.json (14 total) |
-| **GitHub** | Analyze repositories | All 4 | react_github.json, godot_github.json |
-| **PDF** | Extract from PDFs | All 4 | example_pdf.json |
-| **Unified** | Multi-source (docs+GitHub+PDF) | All 4 | react_unified.json (5 total) |
-| **Local Repo** | Unlimited local analysis | All 4 | deck_deck_go_local.json |
+| Mode | Description | Platforms | CLI Command | `create` Detection |
+|------|-------------|-----------|-------------|-------------------|
+| **Documentation** | Scrape HTML docs | All 4 | `scrape` | `https://...` URLs |
+| **GitHub** | Analyze repositories | All 4 | `github` | `owner/repo` or github.com URLs |
+| **PDF** | Extract from PDFs | All 4 | `pdf` | `.pdf` extension |
+| **Word** | Extract from DOCX | All 4 | `word` | `.docx` extension |
+| **EPUB** | Extract from EPUB | All 4 | `epub` | `.epub` extension |
+| **Video** | Video transcription | All 4 | `video` | YouTube/Vimeo URLs, video extensions |
+| **Local Repo** | Local codebase analysis | All 4 | `analyze` | Directory paths |
+| **Jupyter** | Extract from notebooks | All 4 | `jupyter` | `.ipynb` extension |
+| **HTML** | Extract local HTML files | All 4 | `html` | `.html`/`.htm` extension |
+| **OpenAPI** | Extract API specs | All 4 | `openapi` | `.yaml`/`.yml` with OpenAPI content |
+| **AsciiDoc** | Extract AsciiDoc files | All 4 | `asciidoc` | `.adoc`/`.asciidoc` extension |
+| **PowerPoint** | Extract from PPTX | All 4 | `pptx` | `.pptx` extension |
+| **RSS/Atom** | Extract from feeds | All 4 | `rss` | `.rss`/`.atom` extension |
+| **Man Pages** | Extract man pages | All 4 | `manpage` | `.1`-`.8`/`.man` extension |
+| **Confluence** | Extract from Confluence | All 4 | `confluence` | API or export directory |
+| **Notion** | Extract from Notion | All 4 | `notion` | API or export directory |
+| **Chat** | Extract Slack/Discord | All 4 | `chat` | Export directory or API |
+| **Unified** | Multi-source combination | All 4 | `unified` | N/A (config-driven) |

 ## CLI Command Support

-| Command | Platforms | Skill Modes | Multi-Platform Flag |
-|---------|-----------|-------------|---------------------|
-| `scrape` | All | Docs only | No (output is universal) |
-| `github` | All | GitHub only | No (output is universal) |
-| `pdf` | All | PDF only | No (output is universal) |
-| `unified` | All | Unified only | No (output is universal) |
-| `enhance` | Claude, Gemini, OpenAI | All | ✅ `--target` |
-| `package` | All | All | ✅ `--target` |
-| `upload` | Claude, Gemini, OpenAI | All | ✅ `--target` |
-| `estimate` | All | Docs only | No (estimation is universal) |
-| `install` | All | All | ✅ `--target` |
-| `install-agent` | All | All | No (agent-specific paths) |
+| Command | Platforms | Skill Modes | Multi-Platform Flag | Optional Deps |
+|---------|-----------|-------------|---------------------|---------------|
+| `scrape` | All | Docs only | No (output is universal) | None |
+| `github` | All | GitHub only | No (output is universal) | None |
+| `pdf` | All | PDF only | No (output is universal) | `[pdf]` |
+| `word` | All | Word only | No (output is universal) | `[word]` |
+| `epub` | All | EPUB only | No (output is universal) | `[epub]` |
+| `video` | All | Video only | No (output is universal) | `[video]` |
+| `analyze` | All | Local only | No (output is universal) | None |
+| `jupyter` | All | Jupyter only | No (output is universal) | `[jupyter]` |
+| `html` | All | HTML only | No (output is universal) | None |
+| `openapi` | All | OpenAPI only | No (output is universal) | `[openapi]` |
+| `asciidoc` | All | AsciiDoc only | No (output is universal) | `[asciidoc]` |
+| `pptx` | All | PPTX only | No (output is universal) | `[pptx]` |
+| `rss` | All | RSS only | No (output is universal) | `[rss]` |
+| `manpage` | All | Man pages only | No (output is universal) | None |
+| `confluence` | All | Confluence only | No (output is universal) | `[confluence]` |
+| `notion` | All | Notion only | No (output is universal) | `[notion]` |
+| `chat` | All | Chat only | No (output is universal) | `[chat]` |
+| `unified` | All | Unified only | No (output is universal) | Varies by source |
+| `enhance` | Claude, Gemini, OpenAI | All | ✅ `--target` | None |
+| `package` | All | All | ✅ `--target` | None |
+| `upload` | Claude, Gemini, OpenAI | All | ✅ `--target` | None |
+| `estimate` | All | Docs only | No (estimation is universal) | None |
+| `install` | All | All | ✅ `--target` | None |
+| `install-agent` | All | All | No (agent-specific paths) | None |

 ## MCP Tool Support

@@ -50,6 +77,7 @@ Complete feature support across all platforms and skill modes.
 | `scrape_docs` | All | Docs + Unified | No (output is universal) |
 | `scrape_github` | All | GitHub only | No (output is universal) |
 | `scrape_pdf` | All | PDF only | No (output is universal) |
+| `scrape_generic` | All | 10 new types | No (output is universal) |
 | **Packaging Tools** |
 | `package_skill` | All | All | ✅ `target` parameter |
 | `upload_skill` | Claude, Gemini, OpenAI | All | ✅ `target` parameter |
@@ -260,8 +288,21 @@ Before release, verify all combinations:
 - [ ] Docs → Markdown
 - [ ] GitHub → All platforms
 - [ ] PDF → All platforms
- [ ] Unified → All platforms
+- [ ] Word → All platforms
+- [ ] EPUB → All platforms
+- [ ] Video → All platforms
 - [ ] Local Repo → All platforms
+- [ ] Jupyter → All platforms
+- [ ] HTML → All platforms
+- [ ] OpenAPI → All platforms
+- [ ] AsciiDoc → All platforms
+- [ ] PPTX → All platforms
+- [ ] RSS → All platforms
+- [ ] Man Pages → All platforms
+- [ ] Confluence → All platforms
+- [ ] Notion → All platforms
+- [ ] Chat → All platforms
+- [ ] Unified → All platforms

 ## Platform-Specific Notes

@@ -310,7 +351,7 @@ A: Yes! Enhancement adds platform-specific formatting:
 - OpenAI: Plain text assistant instructions

 **Q: Do all skill modes work with all platforms?**
-A: Yes! All 5 skill modes (Docs, GitHub, PDF, Unified, Local Repo) work with all 4 platforms.
+A: Yes! All 17 source types work with all 4 platforms (Claude, Gemini, OpenAI, Markdown).

 ## See Also

--- a/docs/reference/MCP_REFERENCE.md
+++ b/docs/reference/MCP_REFERENCE.md
@@ -1,8 +1,8 @@
 # MCP Reference - Skill Seekers

-> **Version:** 3.1.0  
-> **Last Updated:** 2026-02-16  
-> **Complete reference for 26 MCP tools**
+> **Version:** 3.2.0  
+> **Last Updated:** 2026-03-15  
+> **Complete reference for 27 MCP tools**

 ---

@@ -79,7 +79,7 @@ Essential tools for basic skill creation workflow:
 | `enhance_skill` | AI enhancement |
 | `install_skill` | Complete workflow |

-### Extended Tools (9)
+### Extended Tools (10)

 Advanced scraping and analysis tools:

@@ -88,6 +88,7 @@ Advanced scraping and analysis tools:
 | `scrape_github` | GitHub repository analysis |
 | `scrape_pdf` | PDF extraction |
 | `scrape_codebase` | Local codebase analysis |
+| `scrape_generic` | Generic scraper for 10 new source types |
 | `unified_scrape` | Multi-source scraping |
 | `detect_patterns` | Pattern detection |
 | `extract_test_examples` | Extract usage examples from tests |
@@ -642,6 +643,65 @@ Find discrepancies between documentation and code.

 ---

+#### scrape_generic
+
+Scrape content from any of the 10 new source types.
+
+**Purpose:** A generic entry point that delegates to the appropriate CLI scraper module for: jupyter, html, openapi, asciidoc, pptx, confluence, notion, rss, manpage, chat.
+
+**Parameters:**
+
+| Name | Type | Required | Description |
+|------|------|----------|-------------|
+| `source_type` | string | Yes | One of: `jupyter`, `html`, `openapi`, `asciidoc`, `pptx`, `confluence`, `notion`, `rss`, `manpage`, `chat` |
+| `name` | string | Yes | Skill name for the output |
+| `path` | string | No | File or directory path (for file-based sources) |
+| `url` | string | No | URL (for URL-based sources like confluence, notion, rss) |
+
+**Note:** Either `path` or `url` must be provided depending on the source type.
+
+**Source Type → Input Mapping:**
+
+| Source Type | Typical Input | CLI Flag Used |
+|-------------|--------------|---------------|
+| `jupyter` | `path` | `--notebook` |
+| `html` | `path` | `--html-path` |
+| `openapi` | `path` | `--spec` |
+| `asciidoc` | `path` | `--asciidoc-path` |
+| `pptx` | `path` | `--pptx` |
+| `manpage` | `path` | `--man-path` |
+| `confluence` | `path` or `url` | `--export-path` / `--base-url` |
+| `notion` | `path` or `url` | `--export-path` / `--database-id` |
+| `rss` | `path` or `url` | `--feed-path` / `--feed-url` |
+| `chat` | `path` | `--export-path` |
+
+**Returns:** Scraping results with file paths and statistics
+
+```json
+{
+  "skill_directory": "output/my-api/",
+  "source_type": "openapi",
+  "status": "success"
+}
+```
+
+**Example:**
+```python
+# Natural language
+"Scrape the OpenAPI spec at api/openapi.yaml"
+"Extract content from my Jupyter notebook analysis.ipynb"
+"Process the Confluence export in ./wiki-export/"
+"Convert the PowerPoint slides.pptx into a skill"
+
+# Explicit tool call
+scrape_generic(source_type="openapi", name="my-api", path="api/openapi.yaml")
+scrape_generic(source_type="jupyter", name="ml-tutorial", path="notebooks/tutorial.ipynb")
+scrape_generic(source_type="rss", name="blog", url="https://blog.example.com/feed.xml")
+scrape_generic(source_type="confluence", name="wiki", path="./confluence-export/")
+```
+
+---
+
 ### Config Source Tools

 #### add_config_source
@@ -1030,7 +1090,19 @@ Tools: `list_workflows` → `unified_scrape` → `enhance_skill` → `package_sk

 ---

-### Pattern 5: Vector Database Export
+### Pattern 5: New Source Type Scraping
+
+```python
+# Natural language sequence:
+"Scrape the OpenAPI spec at api/openapi.yaml"
+"Package the output for Claude"
+```
+
+Tools: `scrape_generic` → `package_skill`
+
+---
+
+### Pattern 6: Vector Database Export

 ```python
 # Natural language sequence: