feat: add 10 new skill source types (17 total) with full pipeline integration

Add Jupyter Notebook, Local HTML, OpenAPI/Swagger, AsciiDoc, PowerPoint, RSS/Atom, Man Pages, Confluence, Notion, and Slack/Discord Chat as new skill source types. Each type is fully integrated across: - Standalone CLI commands (skill-seekers <type>) - Auto-detection via 'skill-seekers create' (file extension + content sniffing) - Unified multi-source configs (scraped_data, dispatch, config validation) - Unified skill builder (generic merge + source-attributed synthesis) - MCP server (scrape_generic tool with per-type flag mapping) - pyproject.toml (entry points, optional deps, [all] group) Also fixes: EPUB unified pipeline gap, missing word/video config validators, OpenAPI yaml import guard, MCP flag mismatch for all 10 types, stale docstrings, and adds 77 integration tests + complex-merge workflow. 50 files changed, +20,201 lines
2026-03-15 15:30:15 +03:00
parent 64403a3686
commit 53b911b697
50 changed files with 20193 additions and 856 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,866 +1,171 @@
 # AGENTS.md - Skill Seekers

-Essential guidance for AI coding agents working with the Skill Seekers codebase.
+Concise reference for AI coding agents. Skill Seekers is a Python CLI tool (v3.2.0) that converts documentation sites, GitHub repos, PDFs, videos, notebooks, wikis, and more into AI-ready skills for 16+ LLM platforms and RAG pipelines.

---
-
-## Project Overview
-
-**Skill Seekers** is a Python CLI tool that converts documentation websites, GitHub repositories, PDF files, and videos into AI-ready skills for LLM platforms and RAG (Retrieval-Augmented Generation) pipelines. It serves as the universal preprocessing layer for AI systems.
-
-### Key Facts
-
-| Attribute | Value |
-|-----------|-------|
-| **Current Version** | 3.1.3 |
-| **Python Version** | 3.10+ (tested on 3.10, 3.11, 3.12, 3.13) |
-| **License** | MIT |
-| **Package Name** | `skill-seekers` (PyPI) |
-| **Source Files** | 182 Python files |
-| **Test Files** | 105+ test files |
-| **Website** | https://skillseekersweb.com/ |
-| **Repository** | https://github.com/yusufkaraaslan/Skill_Seekers |
-
-### Supported Target Platforms
-
-| Platform | Format | Use Case |
-|----------|--------|----------|
-| **Claude AI** | ZIP + YAML | Claude Code skills |
-| **Google Gemini** | tar.gz | Gemini skills |
-| **OpenAI ChatGPT** | ZIP + Vector Store | Custom GPTs |
-| **LangChain** | Documents | QA chains, agents, retrievers |
-| **LlamaIndex** | TextNodes | Query engines, chat engines |
-| **Haystack** | Documents | Enterprise RAG pipelines |
-| **Pinecone** | Ready for upsert | Production vector search |
-| **Weaviate** | Vector objects | Vector database |
-| **Qdrant** | Points | Vector database |
-| **Chroma** | Documents | Local vector database |
-| **FAISS** | Index files | Local similarity search |
-| **Cursor IDE** | .cursorrules | AI coding assistant rules |
-| **Windsurf** | .windsurfrules | AI coding rules |
-| **Cline** | .clinerules + MCP | VS Code extension |
-| **Continue.dev** | HTTP context | Universal IDE support |
-| **Generic Markdown** | ZIP | Universal export |
-
-### Core Workflow
-
-1. **Scrape Phase** - Crawl documentation/GitHub/PDF/video sources
-2. **Build Phase** - Organize content into categorized references
-3. **Enhancement Phase** - AI-powered quality improvements (optional)
-4. **Package Phase** - Create platform-specific packages
-5. **Upload Phase** - Auto-upload to target platform (optional)
-
---
-
-## Project Structure
-
-```
-/mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers/
-├── src/skill_seekers/              # Main source code (src/ layout)
-│   ├── cli/                        # CLI tools and commands (~70 modules)
-│   │   ├── adaptors/               # Platform adaptors (Strategy pattern)
-│   │   │   ├── base.py             # Abstract base class (SkillAdaptor)
-│   │   │   ├── claude.py           # Claude AI adaptor
-│   │   │   ├── gemini.py           # Google Gemini adaptor
-│   │   │   ├── openai.py           # OpenAI ChatGPT adaptor
-│   │   │   ├── markdown.py         # Generic Markdown adaptor
-│   │   │   ├── chroma.py           # Chroma vector DB adaptor
-│   │   │   ├── faiss_helpers.py    # FAISS index adaptor
-│   │   │   ├── haystack.py         # Haystack RAG adaptor
-│   │   │   ├── langchain.py        # LangChain adaptor
-│   │   │   ├── llama_index.py      # LlamaIndex adaptor
-│   │   │   ├── qdrant.py           # Qdrant vector DB adaptor
-│   │   │   ├── weaviate.py         # Weaviate vector DB adaptor
-│   │   │   └── streaming_adaptor.py # Streaming output adaptor
-│   │   ├── arguments/              # CLI argument definitions
-│   │   ├── parsers/                # Argument parsers
-│   │   │   └── extractors/         # Content extractors
-│   │   ├── presets/                # Preset configuration management
-│   │   ├── storage/                # Cloud storage adaptors
-│   │   ├── main.py                 # Unified CLI entry point
-│   │   ├── create_command.py       # Unified create command
-│   │   ├── doc_scraper.py          # Documentation scraper
-│   │   ├── github_scraper.py       # GitHub repository scraper
-│   │   ├── pdf_scraper.py          # PDF extraction
-│   │   ├── word_scraper.py         # Word document scraper
-│   │   ├── video_scraper.py        # Video extraction
-│   │   ├── video_setup.py          # GPU detection & dependency installation
-│   │   ├── unified_scraper.py      # Multi-source scraping
-│   │   ├── codebase_scraper.py     # Local codebase analysis
-│   │   ├── enhance_command.py      # AI enhancement command
-│   │   ├── enhance_skill_local.py  # AI enhancement (local mode)
-│   │   ├── package_skill.py        # Skill packager
-│   │   ├── upload_skill.py         # Upload to platforms
-│   │   ├── cloud_storage_cli.py    # Cloud storage CLI
-│   │   ├── benchmark_cli.py        # Benchmarking CLI
-│   │   ├── sync_cli.py             # Sync monitoring CLI
-│   │   └── workflows_command.py    # Workflow management CLI
-│   ├── mcp/                        # MCP server integration
-│   │   ├── server_fastmcp.py       # FastMCP server (~708 lines)
-│   │   ├── server_legacy.py        # Legacy server implementation
-│   │   ├── server.py               # Server entry point
-│   │   ├── agent_detector.py       # AI agent detection
-│   │   ├── git_repo.py             # Git repository operations
-│   │   ├── source_manager.py       # Config source management
-│   │   └── tools/                  # MCP tool implementations
-│   │       ├── config_tools.py     # Configuration tools
-│   │       ├── packaging_tools.py  # Packaging tools
-│   │       ├── scraping_tools.py   # Scraping tools
-│   │       ├── source_tools.py     # Source management tools
-│   │       ├── splitting_tools.py  # Config splitting tools
-│   │       ├── vector_db_tools.py  # Vector database tools
-│   │       └── workflow_tools.py   # Workflow management tools
-│   ├── sync/                       # Sync monitoring module
-│   │   ├── detector.py             # Change detection
-│   │   ├── models.py               # Data models (Pydantic)
-│   │   ├── monitor.py              # Monitoring logic
-│   │   └── notifier.py             # Notification system
-│   ├── benchmark/                  # Benchmarking framework
-│   │   ├── framework.py            # Benchmark framework
-│   │   ├── models.py               # Benchmark models
-│   │   └── runner.py               # Benchmark runner
-│   ├── embedding/                  # Embedding server
-│   │   ├── server.py               # FastAPI embedding server
-│   │   ├── generator.py            # Embedding generation
-│   │   ├── cache.py                # Embedding cache
-│   │   └── models.py               # Embedding models
-│   ├── workflows/                  # YAML workflow presets (66 presets)
-│   ├── _version.py                 # Version information (reads from pyproject.toml)
-│   └── __init__.py                 # Package init
-├── tests/                          # Test suite (105+ test files)
-├── configs/                        # Preset configuration files
-├── docs/                           # Documentation (80+ markdown files)
-│   ├── integrations/               # Platform integration guides
-│   ├── guides/                     # User guides
-│   ├── reference/                  # API reference
-│   ├── features/                   # Feature documentation
-│   ├── blog/                       # Blog posts
-│   └── roadmap/                    # Roadmap documents
-├── examples/                       # Usage examples
-├── .github/workflows/              # CI/CD workflows
-├── pyproject.toml                  # Main project configuration
-├── requirements.txt                # Pinned dependencies
-├── mypy.ini                        # MyPy type checker configuration
-├── Dockerfile                      # Main Docker image (multi-stage)
-├── Dockerfile.mcp                  # MCP server Docker image
-└── docker-compose.yml              # Full stack deployment
-```
-
---
-
-## Build and Development Commands
-
-### Prerequisites
-
- Python 3.10 or higher
- pip or uv package manager
- Git (for GitHub scraping features)
-
-### Setup (REQUIRED before any development)
+## Setup

 ```bash
-# Install in editable mode (REQUIRED for tests due to src/ layout)
+# REQUIRED before running tests (src/ layout — tests fail without this)
 pip install -e .
-
-# Install with all platform dependencies
-pip install -e ".[all-llms]"
-
-# Install with all optional dependencies
-pip install -e ".[all]"
-
-# Install specific platforms only
-pip install -e ".[gemini]"    # Google Gemini support
-pip install -e ".[openai]"    # OpenAI ChatGPT support
-pip install -e ".[mcp]"       # MCP server dependencies
-pip install -e ".[s3]"        # AWS S3 support
-pip install -e ".[gcs]"       # Google Cloud Storage
-pip install -e ".[azure]"     # Azure Blob Storage
-pip install -e ".[embedding]" # Embedding server support
-pip install -e ".[rag-upload]" # Vector DB upload support
-
-# Install dev dependencies (using dependency-groups)
+# With dev tools
 pip install -e ".[dev]"
+# With all optional deps
+pip install -e ".[all]"
 ```

-**CRITICAL:** The project uses a `src/` layout. Tests WILL FAIL unless you install with `pip install -e .` first.
-
-### Building
+## Build / Test / Lint Commands

 ```bash
-# Build package using uv (recommended)
-uv build
-
-# Or using standard build
-python -m build
-
-# Publish to PyPI
-uv publish
-```
-
-### Docker
-
-```bash
-# Build Docker image
-docker build -t skill-seekers .
-
-# Run with docker-compose (includes vector databases)
-docker-compose up -d
-
-# Run MCP server only
-docker-compose up -d mcp-server
-
-# View logs
-docker-compose logs -f mcp-server
-```
-
---
-
-## Testing Instructions
-
-### Running Tests
-
-**CRITICAL:** Never skip tests - all tests must pass before commits.
-
-```bash
-# All tests (must run pip install -e . first!)
+# Run ALL tests (never skip tests — all must pass before commits)
 pytest tests/ -v

-# Specific test file
+# Run a single test file
 pytest tests/test_scraper_features.py -v
-pytest tests/test_mcp_fastmcp.py -v
-pytest tests/test_cloud_storage.py -v

-# With coverage
-pytest tests/ --cov=src/skill_seekers --cov-report=term --cov-report=html
-
-# Single test
+# Run a single test function
 pytest tests/test_scraper_features.py::test_detect_language -v

-# E2E tests
-pytest tests/test_e2e_three_stream_pipeline.py -v
+# Run a single test class method
+pytest tests/test_adaptors/test_claude_adaptor.py::TestClaudeAdaptor::test_package -v

-# Skip slow tests
-pytest tests/ -v -m "not slow"
-
-# Run only integration tests
-pytest tests/ -v -m integration
-
-# Run only specific marker
+# Skip slow/integration tests
 pytest tests/ -v -m "not slow and not integration"
-```

-### Test Architecture
+# With coverage
+pytest tests/ --cov=src/skill_seekers --cov-report=term

- **105+ test files** covering all features
- **CI Matrix:** Ubuntu + macOS, Python 3.10-3.12
- Test markers defined in `pyproject.toml`:
-
-| Marker | Description |
-|--------|-------------|
-| `slow` | Tests taking >5 seconds |
-| `integration` | Requires external services (APIs) |
-| `e2e` | End-to-end tests (resource-intensive) |
-| `venv` | Requires virtual environment setup |
-| `bootstrap` | Bootstrap skill specific |
-| `benchmark` | Performance benchmark tests |
-
-### Test Configuration
-
-From `pyproject.toml`:
-```toml
-[tool.pytest.ini_options]
-testpaths = ["tests"]
-python_files = ["test_*.py"]
-addopts = "-v --tb=short --strict-markers"
-asyncio_mode = "auto"
-asyncio_default_fixture_loop_scope = "function"
-```
-
-The `conftest.py` file checks that the package is installed before running tests.
-
---
-
-## Code Style Guidelines
-
-### Linting and Formatting
-
-```bash
-# Run ruff linter
+# Lint (ruff)
 ruff check src/ tests/
-
-# Run ruff formatter check
-ruff format --check src/ tests/
-
-# Auto-fix issues
 ruff check src/ tests/ --fix
+
+# Format (ruff)
+ruff format --check src/ tests/
 ruff format src/ tests/

-# Run mypy type checker
+# Type check (mypy)
 mypy src/skill_seekers --show-error-codes --pretty
 ```

-### Style Rules (from pyproject.toml)
+**Test markers:** `slow`, `integration`, `e2e`, `venv`, `bootstrap`, `benchmark`
+**Async tests:** use `@pytest.mark.asyncio`; asyncio_mode is `auto`.

+## Code Style
+
+### Formatting Rules (ruff — from pyproject.toml)
 - **Line length:** 100 characters
 - **Target Python:** 3.10+
- **Enabled rules:** E, W, F, I, B, C4, UP, ARG, SIM
- **Ignored rules:** E501, F541, ARG002, B007, I001, SIM114
- **Import sorting:** isort style with `skill_seekers` as first-party
+- **Enabled lint rules:** E, W, F, I, B, C4, UP, ARG, SIM
+- **Ignored rules:** E501 (line length handled by formatter), F541 (f-string style), ARG002 (unused method args for interface compliance), B007 (intentional unused loop vars), I001 (formatter handles imports), SIM114 (readability preference)

-### MyPy Configuration (from pyproject.toml)
+### Imports
+- Sort with isort (via ruff); `skill_seekers` is first-party
+- Standard library → third-party → first-party, separated by blank lines
+- Use `from __future__ import annotations` only if needed for forward refs
+- Guard optional imports with try/except ImportError (see `adaptors/__init__.py` pattern)

-```toml
-[tool.mypy]
-python_version = "3.10"
-warn_return_any = true
-warn_unused_configs = true
-disallow_untyped_defs = false
-disallow_incomplete_defs = false
-check_untyped_defs = true
-ignore_missing_imports = true
-show_error_codes = true
-pretty = true
+### Naming Conventions
+- **Files:** `snake_case.py`
+- **Classes:** `PascalCase` (e.g., `SkillAdaptor`, `ClaudeAdaptor`)
+- **Functions/methods:** `snake_case`
+- **Constants:** `UPPER_CASE` (e.g., `ADAPTORS`, `DEFAULT_CHUNK_TOKENS`)
+- **Private:** prefix with `_`
+
+### Type Hints
+- Gradual typing — add hints where practical, not enforced everywhere
+- Use modern syntax: `str | None` not `Optional[str]`, `list[str]` not `List[str]`
+- MyPy config: `disallow_untyped_defs = false`, `check_untyped_defs = true`, `ignore_missing_imports = true`
+
+### Docstrings
+- Module-level docstring on every file (triple-quoted, describes purpose)
+- Google-style or standard docstrings for public functions/classes
+- Include `Args:`, `Returns:`, `Raises:` sections where useful
+
+### Error Handling
+- Use specific exceptions, never bare `except:`
+- Provide helpful error messages with context (see `get_adaptor()` in `adaptors/__init__.py`)
+- Use `raise ValueError(...)` for invalid arguments, `raise RuntimeError(...)` for state errors
+- Guard optional dependency imports with try/except and give clear install instructions on failure
+
+### Suppressing Lint Warnings
+- Use inline `# noqa: XXXX` comments (e.g., `# noqa: F401` for re-exports, `# noqa: ARG001` for required but unused params)
+
+## Supported Source Types (17)
+
+| Type | CLI Command | Config Type | Detection |
+|------|------------|-------------|-----------|
+| Documentation (web) | `scrape` / `create <url>` | `documentation` | HTTP/HTTPS URLs |
+| GitHub repo | `github` / `create owner/repo` | `github` | `owner/repo` or github.com URLs |
+| PDF | `pdf` / `create file.pdf` | `pdf` | `.pdf` extension |
+| Word (.docx) | `word` / `create file.docx` | `word` | `.docx` extension |
+| EPUB | `epub` / `create file.epub` | `epub` | `.epub` extension |
+| Video | `video` / `create <url/file>` | `video` | YouTube/Vimeo URLs, video extensions |
+| Local codebase | `analyze` / `create ./path` | `local` | Directory paths |
+| Jupyter Notebook | `jupyter` / `create file.ipynb` | `jupyter` | `.ipynb` extension |
+| Local HTML | `html` / `create file.html` | `html` | `.html`/`.htm` extensions |
+| OpenAPI/Swagger | `openapi` / `create spec.yaml` | `openapi` | `.yaml`/`.yml` with OpenAPI content |
+| AsciiDoc | `asciidoc` / `create file.adoc` | `asciidoc` | `.adoc`/`.asciidoc` extensions |
+| PowerPoint | `pptx` / `create file.pptx` | `pptx` | `.pptx` extension |
+| RSS/Atom | `rss` / `create feed.rss` | `rss` | `.rss`/`.atom` extensions |
+| Man pages | `manpage` / `create cmd.1` | `manpage` | `.1`-`.8`/`.man` extensions |
+| Confluence | `confluence` | `confluence` | API or export directory |
+| Notion | `notion` | `notion` | API or export directory |
+| Slack/Discord | `chat` | `chat` | Export directory or API |
+
+## Project Layout
+
+```
+src/skill_seekers/           # Main package (src/ layout)
+  cli/                       # CLI commands and entry points
+    adaptors/                # Platform adaptors (Strategy pattern, inherit SkillAdaptor)
+    arguments/               # CLI argument definitions (one per source type)
+    parsers/                 # Subcommand parsers (one per source type)
+    storage/                 # Cloud storage (inherit BaseStorageAdaptor)
+    main.py                  # Unified CLI entry point (COMMAND_MODULES dict)
+    source_detector.py       # Auto-detects source type from user input
+    create_command.py        # Unified `create` command routing
+    config_validator.py      # VALID_SOURCE_TYPES set + per-type validation
+    unified_scraper.py       # Multi-source orchestrator (scraped_data + dispatch)
+    unified_skill_builder.py # Pairwise synthesis + generic merge
+  mcp/                       # MCP server (FastMCP + legacy)
+    tools/                   # MCP tool implementations by category
+  sync/                      # Sync monitoring (Pydantic models)
+  benchmark/                 # Benchmarking framework
+  embedding/                 # FastAPI embedding server
+  workflows/                 # 67 YAML workflow presets (includes complex-merge.yaml)
+  _version.py                # Reads version from pyproject.toml
+tests/                       # 115+ test files (pytest)
+configs/                     # Preset JSON scraping configs
+docs/                        # 80+ markdown doc files
 ```

-### Code Conventions
+## Key Patterns

-1. **Use type hints** where practical (gradual typing approach)
-2. **Docstrings:** Use Google-style or standard docstrings
-3. **Error handling:** Use specific exceptions, provide helpful messages
-4. **Async code:** Use `asyncio`, mark tests with `@pytest.mark.asyncio`
-5. **File naming:** Use snake_case for all Python files
-6. **Class naming:** Use PascalCase for classes
-7. **Function naming:** Use snake_case for functions and methods
-8. **Constants:** Use UPPER_CASE for module-level constants
+**Adaptor (Strategy) pattern** — all platform logic in `cli/adaptors/`. Inherit `SkillAdaptor`, implement `format_skill_md()`, `package()`, `upload()`. Register in `adaptors/__init__.py` ADAPTORS dict.

---
+**Scraper pattern** — each source type has: `cli/<type>_scraper.py` (with `<Type>ToSkillConverter` class + `main()`), `arguments/<type>.py`, `parsers/<type>_parser.py`. Register in `parsers/__init__.py` PARSERS list, `main.py` COMMAND_MODULES dict, `config_validator.py` VALID_SOURCE_TYPES set.

-## Architecture Patterns
+**Unified pipeline** — `unified_scraper.py` dispatches to per-type `_scrape_<type>()` methods. `unified_skill_builder.py` uses pairwise synthesis for docs+github+pdf combos and `_generic_merge()` for all other combinations.

-### Platform Adaptor Pattern (Strategy Pattern)
+**MCP tools** — grouped in `mcp/tools/` by category. `scrape_generic_tool` handles all new source types.

-All platform-specific logic is encapsulated in adaptors:
-
-```python
-from skill_seekers.cli.adaptors import get_adaptor
-
-# Get platform-specific adaptor
-adaptor = get_adaptor('gemini')  # or 'claude', 'openai', 'langchain', etc.
-
-# Package skill
-adaptor.package(skill_dir='output/react/', output_path='output/')
-
-# Upload to platform
-adaptor.upload(
-    package_path='output/react-gemini.tar.gz',
-    api_key=os.getenv('GOOGLE_API_KEY')
-)
-```
-
-Each adaptor inherits from `SkillAdaptor` base class and implements:
- `format_skill_md()` - Format SKILL.md content
- `package()` - Create platform-specific package
- `upload()` - Upload to platform API
- `validate_api_key()` - Validate API key format
- `supports_enhancement()` - Whether AI enhancement is supported
-
-### CLI Architecture (Git-style)
-
-Entry point: `src/skill_seekers/cli/main.py`
-
-The CLI uses subcommands that delegate to existing modules:
-
-```bash
-# skill-seekers scrape --config react.json
-# Transforms to: doc_scraper.main() with modified sys.argv
-```
-
-**Available subcommands:**
- `create` - Unified create command
- `config` - Configuration wizard
- `scrape` - Documentation scraping
- `github` - GitHub repository scraping
- `pdf` - PDF extraction
- `word` - Word document extraction
- `video` - Video extraction (YouTube or local). Use `--setup` to auto-detect GPU and install visual deps.
- `unified` - Multi-source scraping
- `analyze` / `codebase` - Local codebase analysis
- `enhance` - AI enhancement
- `package` - Package skill for target platform
- `upload` - Upload to platform
- `cloud` - Cloud storage operations
- `sync` - Sync monitoring
- `benchmark` - Performance benchmarking
- `embed` - Embedding server
- `install` / `install-agent` - Complete workflow
- `stream` - Streaming ingestion
- `update` - Incremental updates
- `multilang` - Multi-language support
- `quality` - Quality metrics
- `resume` - Resume interrupted jobs
- `estimate` - Estimate page counts
- `workflows` - Workflow management
-
-### MCP Server Architecture
-
-Two implementations:
- `server_fastmcp.py` - Modern, decorator-based (recommended, ~708 lines)
- `server_legacy.py` - Legacy implementation
-
-Tools are organized by category:
- Config tools (3 tools): generate_config, list_configs, validate_config
- Scraping tools (10 tools): estimate_pages, scrape_docs, scrape_github, scrape_pdf, scrape_video (supports `setup` parameter for GPU detection and visual dep installation), scrape_codebase, detect_patterns, extract_test_examples, build_how_to_guides, extract_config_patterns
- Packaging tools (4 tools): package_skill, upload_skill, enhance_skill, install_skill
- Source tools (5 tools): fetch_config, submit_config, add_config_source, list_config_sources, remove_config_source
- Splitting tools (2 tools): split_config, generate_router
- Vector Database tools (4 tools): export_to_weaviate, export_to_chroma, export_to_faiss, export_to_qdrant
- Workflow tools (5 tools): list_workflows, get_workflow, create_workflow, update_workflow, delete_workflow
-
-**Running MCP Server:**
-```bash
-# Stdio transport (default)
-python -m skill_seekers.mcp.server_fastmcp
-
-# HTTP transport
-python -m skill_seekers.mcp.server_fastmcp --http --port 8765
-```
-
-### Cloud Storage Architecture
-
-Abstract base class pattern for cloud providers:
- `base_storage.py` - Defines `BaseStorageAdaptor` interface
- `s3_storage.py` - AWS S3 implementation
- `gcs_storage.py` - Google Cloud Storage implementation
- `azure_storage.py` - Azure Blob Storage implementation
-
-### Sync Monitoring Architecture
-
-Pydantic-based models in `src/skill_seekers/sync/`:
- `models.py` - Data models (SyncConfig, ChangeReport, SyncState)
- `detector.py` - Change detection logic
- `monitor.py` - Monitoring daemon
- `notifier.py` - Notification system (webhook, email, slack)
-
---
+**CLI subcommands** — git-style in `cli/main.py`. Each delegates to a module's `main()` function.

 ## Git Workflow

-### Branch Structure
+- **`main`** — production, protected
+- **`development`** — default PR target, active dev
+- Feature branches created from `development`

-```
-main (production)
-  ↑
-  │ (only maintainer merges)
-  │
-development (integration) ← default branch for PRs
-  ↑
-  │ (all contributor PRs go here)
-  │
-feature branches
-```
-
- **`main`** - Production, always stable, protected
- **`development`** - Active development, default for PRs
- **Feature branches** - Your work, created from `development`
-
-### Creating a Feature Branch
+## Pre-commit Checklist

 ```bash
-# 1. Checkout development
-git checkout development
-git pull upstream development
-
-# 2. Create feature branch
-git checkout -b my-feature
-
-# 3. Make changes, commit, push
-git add .
-git commit -m "Add my feature"
-git push origin my-feature
-
-# 4. Create PR targeting 'development' branch
-```
-
---
-
-## CI/CD Configuration
-
-### GitHub Actions Workflows
-
-All workflows are in `.github/workflows/`:
-
-**`tests.yml`:**
- Runs on: push/PR to `main` and `development`
- Lint job: Ruff + MyPy
- Test matrix: Ubuntu + macOS, Python 3.10-3.12
- Coverage: Uploads to Codecov
-
-**`release.yml`:**
- Triggered on version tags (`v*`)
- Builds and publishes to PyPI using `uv`
- Creates GitHub release with changelog
-
-**`docker-publish.yml`:**
- Builds and publishes Docker images
- Multi-architecture support (linux/amd64, linux/arm64)
-
-**`vector-db-export.yml`:**
- Tests vector database exports
-
-**`scheduled-updates.yml`:**
- Scheduled sync monitoring
-
-**`quality-metrics.yml`:**
- Quality metrics tracking
-
-**`test-vector-dbs.yml`:**
- Vector database integration tests
-
-### Pre-commit Checks (Manual)
-
-```bash
-# Before committing, run:
 ruff check src/ tests/
 ruff format --check src/ tests/
-pytest tests/ -v -x  # Stop on first failure
+pytest tests/ -v -x   # stop on first failure
 ```

---
+Never commit API keys. Use env vars: `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, `OPENAI_API_KEY`, `GITHUB_TOKEN`.

-## Security Considerations
+## CI

-### API Keys and Secrets
-
-1. **Never commit API keys** to the repository
-2. **Use environment variables:**
-   - `ANTHROPIC_API_KEY` - Claude AI
-   - `GOOGLE_API_KEY` - Google Gemini
-   - `OPENAI_API_KEY` - OpenAI
-   - `GITHUB_TOKEN` - GitHub API
-   - `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` - AWS S3
-   - `GOOGLE_APPLICATION_CREDENTIALS` - GCS
-   - `AZURE_STORAGE_CONNECTION_STRING` - Azure
-3. **Configuration storage:**
-   - Stored at `~/.config/skill-seekers/config.json`
-   - Permissions: 600 (owner read/write only)
-
-### Rate Limit Handling
-
- GitHub API has rate limits (5000 requests/hour for authenticated)
- The tool has built-in rate limit handling with retry logic
- Use `--non-interactive` flag for CI/CD environments
-
-### Custom API Endpoints
-
-Support for Claude-compatible APIs:
-
-```bash
-export ANTHROPIC_API_KEY=your-custom-api-key
-export ANTHROPIC_BASE_URL=https://custom-endpoint.com/v1
-```
-
---
-
-## Common Development Tasks
-
-### Adding a New CLI Command
-
-1. Create module in `src/skill_seekers/cli/my_command.py`
-2. Implement `main()` function with argument parsing
-3. Add entry point in `pyproject.toml`:
-   ```toml
-   [project.scripts]
-   skill-seekers-my-command = "skill_seekers.cli.my_command:main"
-   ```
-4. Add subcommand handler in `src/skill_seekers/cli/main.py`
-5. Add argument parser in `src/skill_seekers/cli/parsers/`
-6. Add tests in `tests/test_my_command.py`
-
-### Adding a New Platform Adaptor
-
-1. Create `src/skill_seekers/cli/adaptors/my_platform.py`
-2. Inherit from `SkillAdaptor` base class
-3. Implement required methods: `package()`, `upload()`, `format_skill_md()`
-4. Register in `src/skill_seekers/cli/adaptors/__init__.py`
-5. Add optional dependencies in `pyproject.toml`
-6. Add tests in `tests/test_adaptors/`
-
-### Adding an MCP Tool
-
-1. Implement tool logic in `src/skill_seekers/mcp/tools/category_tools.py`
-2. Register in `src/skill_seekers/mcp/server_fastmcp.py`
-3. Add test in `tests/test_mcp_fastmcp.py`
-
-### Adding Cloud Storage Provider
-
-1. Create module in `src/skill_seekers/cli/storage/my_storage.py`
-2. Inherit from `BaseStorageAdaptor` base class
-3. Implement required methods: `upload_file()`, `download_file()`, `list_files()`, `delete_file()`
-4. Register in `src/skill_seekers/cli/storage/__init__.py`
-5. Add optional dependencies in `pyproject.toml`
-
---
-
-## Documentation
-
-### Project Documentation (New Structure - v3.1.0+)
-
-**Entry Points:**
- **README.md** - Main project documentation with navigation
- **docs/README.md** - Documentation hub
- **AGENTS.md** - This file, for AI coding agents
-
-**Getting Started (for new users):**
- `docs/getting-started/01-installation.md` - Installation guide
- `docs/getting-started/02-quick-start.md` - 3 commands to first skill
- `docs/getting-started/03-your-first-skill.md` - Complete walkthrough
- `docs/getting-started/04-next-steps.md` - Where to go from here
-
-**User Guides (common tasks):**
- `docs/user-guide/01-core-concepts.md` - How Skill Seekers works
- `docs/user-guide/02-scraping.md` - All scraping options
- `docs/user-guide/03-enhancement.md` - AI enhancement explained
- `docs/user-guide/04-packaging.md` - Export to platforms
- `docs/user-guide/05-workflows.md` - Enhancement workflows
- `docs/user-guide/06-troubleshooting.md` - Common issues
-
-**Reference (technical details):**
- `docs/reference/CLI_REFERENCE.md` - Complete command reference (20 commands)
- `docs/reference/MCP_REFERENCE.md` - MCP tools reference (33 tools)
- `docs/reference/CONFIG_FORMAT.md` - JSON configuration specification
- `docs/reference/ENVIRONMENT_VARIABLES.md` - All environment variables
-
-**Advanced (power user topics):**
- `docs/advanced/mcp-server.md` - MCP server setup
- `docs/advanced/mcp-tools.md` - Advanced MCP usage
- `docs/advanced/custom-workflows.md` - Creating custom workflows
- `docs/advanced/multi-source.md` - Multi-source scraping
-
-### Configuration Documentation
-
-Preset configs are in `configs/` directory:
- `godot.json` / `godot_unified.json` - Godot Engine
- `blender.json` / `blender-unified.json` - Blender Engine
- `claude-code.json` - Claude Code
- `httpx_comprehensive.json` - HTTPX library
- `medusa-mercurjs.json` - Medusa/MercurJS
- `astrovalley_unified.json` - Astrovalley
- `react.json` - React documentation
- `configs/integrations/` - Integration-specific configs
-
---
-
-## Key Dependencies
-
-### Core Dependencies (Required)
-
-| Package | Version | Purpose |
-|---------|---------|---------|
-| `requests` | >=2.32.5 | HTTP requests |
-| `beautifulsoup4` | >=4.14.2 | HTML parsing |
-| `PyGithub` | >=2.5.0 | GitHub API |
-| `GitPython` | >=3.1.40 | Git operations |
-| `httpx` | >=0.28.1 | Async HTTP |
-| `anthropic` | >=0.76.0 | Claude AI API |
-| `PyMuPDF` | >=1.24.14 | PDF processing |
-| `Pillow` | >=11.0.0 | Image processing |
-| `pytesseract` | >=0.3.13 | OCR |
-| `pydantic` | >=2.12.3 | Data validation |
-| `pydantic-settings` | >=2.11.0 | Settings management |
-| `click` | >=8.3.0 | CLI framework |
-| `Pygments` | >=2.19.2 | Syntax highlighting |
-| `pathspec` | >=0.12.1 | Path matching |
-| `networkx` | >=3.0 | Graph operations |
-| `schedule` | >=1.2.0 | Scheduled tasks |
-| `python-dotenv` | >=1.1.1 | Environment variables |
-| `jsonschema` | >=4.25.1 | JSON validation |
-| `PyYAML` | >=6.0 | YAML parsing |
-| `langchain` | >=1.2.10 | LangChain integration |
-| `llama-index` | >=0.14.15 | LlamaIndex integration |
-
-### Optional Dependencies
-
-| Feature | Package | Install Command |
-|---------|---------|-----------------|
-| MCP Server | `mcp>=1.25,<2` | `pip install -e ".[mcp]"` |
-| Google Gemini | `google-generativeai>=0.8.0` | `pip install -e ".[gemini]"` |
-| OpenAI | `openai>=1.0.0` | `pip install -e ".[openai]"` |
-| AWS S3 | `boto3>=1.34.0` | `pip install -e ".[s3]"` |
-| Google Cloud Storage | `google-cloud-storage>=2.10.0` | `pip install -e ".[gcs]"` |
-| Azure Blob Storage | `azure-storage-blob>=12.19.0` | `pip install -e ".[azure]"` |
-| Word Documents | `mammoth>=1.6.0`, `python-docx>=1.1.0` | `pip install -e ".[docx]"` |
-| Video (lightweight) | `yt-dlp>=2024.12.0`, `youtube-transcript-api>=1.2.0` | `pip install -e ".[video]"` |
-| Video (full) | +`faster-whisper`, `scenedetect`, `opencv-python-headless` (`easyocr` now installed via `--setup`) | `pip install -e ".[video-full]"` |
-| Video (GPU setup) | Auto-detects GPU, installs PyTorch + easyocr + all visual deps | `skill-seekers video --setup` |
-| Chroma DB | `chromadb>=0.4.0` | `pip install -e ".[chroma]"` |
-| Weaviate | `weaviate-client>=3.25.0` | `pip install -e ".[weaviate]"` |
-| Pinecone | `pinecone>=5.0.0` | `pip install -e ".[pinecone]"` |
-| Embedding Server | `fastapi>=0.109.0`, `uvicorn>=0.27.0`, `sentence-transformers>=2.3.0` | `pip install -e ".[embedding]"` |
-
-### Dev Dependencies (in dependency-groups)
-
-| Package | Version | Purpose |
-|---------|---------|---------|
-| `pytest` | >=8.4.2 | Testing framework |
-| `pytest-asyncio` | >=0.24.0 | Async test support |
-| `pytest-cov` | >=7.0.0 | Coverage |
-| `coverage` | >=7.11.0 | Coverage reporting |
-| `ruff` | >=0.14.13 | Linting/formatting |
-| `mypy` | >=1.19.1 | Type checking |
-| `psutil` | >=5.9.0 | Process utilities for testing |
-| `numpy` | >=1.24.0 | Numerical operations |
-| `starlette` | >=0.31.0 | HTTP transport testing |
-| `httpx` | >=0.24.0 | HTTP client for testing |
-| `boto3` | >=1.26.0 | AWS S3 testing |
-| `google-cloud-storage` | >=2.10.0 | GCS testing |
-| `azure-storage-blob` | >=12.17.0 | Azure testing |
-
---
-
-## Troubleshooting
-
-### Common Issues
-
-**ImportError: No module named 'skill_seekers'**
- Solution: Run `pip install -e .`
-
-**Tests failing with "package not installed"**
- Solution: Ensure you ran `pip install -e .` in the correct virtual environment
-
-**MCP server import errors**
- Solution: Install with `pip install -e ".[mcp]"`
-
-**Type checking failures**
- MyPy is configured to be lenient (gradual typing)
- Focus on critical paths, not full coverage
-
-**Docker build failures**
- Ensure you have BuildKit enabled: `DOCKER_BUILDKIT=1`
- Check that all submodules are initialized: `git submodule update --init`
-
-**Rate limit errors from GitHub**
- Set `GITHUB_TOKEN` environment variable for authenticated requests
- Improves rate limit from 60 to 5000 requests/hour
-
-### Getting Help
-
- Check **TROUBLESHOOTING.md** for detailed solutions
- Review **docs/FAQ.md** for common questions
- Visit https://skillseekersweb.com/ for documentation
- Open an issue on GitHub with:
-  - Clear title and description
-  - Steps to reproduce
-  - Expected vs actual behavior
-  - Environment details (OS, Python version)
-  - Error messages and stack traces
-
---
-
-## Environment Variables Reference
-
-| Variable | Purpose | Required For |
-|----------|---------|--------------|
-| `ANTHROPIC_API_KEY` | Claude AI API access | Claude enhancement/upload |
-| `GOOGLE_API_KEY` | Google Gemini API access | Gemini enhancement/upload |
-| `OPENAI_API_KEY` | OpenAI API access | OpenAI enhancement/upload |
-| `GITHUB_TOKEN` | GitHub API authentication | GitHub scraping (recommended) |
-| `AWS_ACCESS_KEY_ID` | AWS S3 authentication | S3 cloud storage |
-| `AWS_SECRET_ACCESS_KEY` | AWS S3 authentication | S3 cloud storage |
-| `GOOGLE_APPLICATION_CREDENTIALS` | GCS authentication path | GCS cloud storage |
-| `AZURE_STORAGE_CONNECTION_STRING` | Azure Blob authentication | Azure cloud storage |
-| `ANTHROPIC_BASE_URL` | Custom Claude endpoint | Custom API endpoints |
-| `SKILL_SEEKERS_HOME` | Data directory path | Docker/runtime |
-| `SKILL_SEEKERS_OUTPUT` | Output directory path | Docker/runtime |
-
---
-
-## Version Management
-
-The version is defined in `pyproject.toml` and dynamically read by `src/skill_seekers/_version.py`:
-
-```python
-# _version.py reads from pyproject.toml
-__version__ = get_version()  # Returns version from pyproject.toml
-```
-
-**To update version:**
-1. Edit `version` in `pyproject.toml`
-2. The `_version.py` file will automatically pick up the new version
-
---
-
-## Configuration File Format
-
-Skill Seekers uses JSON configuration files to define scraping targets. Example structure:
-
-```json
-{
-  "name": "godot",
-  "description": "Godot Engine documentation",
-  "merge_mode": "claude-enhanced",
-  "sources": [
-    {
-      "type": "documentation",
-      "base_url": "https://docs.godotengine.org/en/stable/",
-      "extract_api": true,
-      "selectors": {
-        "main_content": "div[role='main']",
-        "title": "title",
-        "code_blocks": "pre"
-      },
-      "url_patterns": {
-        "include": [],
-        "exclude": ["/search.html", "/_static/"]
-      },
-      "categories": {
-        "getting_started": ["introduction", "getting_started"],
-        "scripting": ["scripting", "gdscript"]
-      },
-      "rate_limit": 0.5,
-      "max_pages": 500
-    },
-    {
-      "type": "github",
-      "repo": "godotengine/godot",
-      "enable_codebase_analysis": true,
-      "code_analysis_depth": "deep",
-      "fetch_issues": true,
-      "max_issues": 100
-    }
-  ]
-}
-```
-
---
-
-## Workflow Presets
-
-Skill Seekers includes 66 YAML workflow presets for AI enhancement in `src/skill_seekers/workflows/`:
-
-**Built-in presets:**
- `default.yaml` - Standard enhancement workflow
- `minimal.yaml` - Fast, minimal enhancement
- `security-focus.yaml` - Security-focused review
- `architecture-comprehensive.yaml` - Deep architecture analysis
- `api-documentation.yaml` - API documentation focus
- And 61 more specialized presets...
-
-**Usage:**
-```bash
-# Apply a preset
-skill-seekers create ./my-project --enhance-workflow security-focus
-
-# Chain multiple presets
-skill-seekers create ./my-project --enhance-workflow security-focus --enhance-workflow minimal
-
-# Manage presets
-skill-seekers workflows list
-skill-seekers workflows show security-focus
-skill-seekers workflows copy security-focus
-```
-
---
-
-*This document is maintained for AI coding agents. For human contributors, see README.md and CONTRIBUTING.md.*
-
-*Last updated: 2026-03-01*
+GitHub Actions (`.github/workflows/tests.yml`): ruff + mypy lint job, then pytest matrix (Ubuntu + macOS, Python 3.10-3.12) with Codecov upload.
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -8,6 +8,77 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]

 ### Added
+
+#### 10 New Skill Source Types (17 total)
+
+Skill Seekers now supports 17 source types — up from 7. Every new type is fully integrated into the CLI (`skill-seekers <type>`), `create` command auto-detection, unified multi-source configs, config validation, the MCP server, and the skill builder.
+
+- **Jupyter Notebook** — `skill-seekers jupyter --notebook file.ipynb` or `skill-seekers create file.ipynb`
+  - Extracts markdown cells, code cells with outputs, kernel metadata, imports, and language detection
+  - Handles single files and directories of notebooks; filters `.ipynb_checkpoints`
+  - Optional dependency: `pip install "skill-seekers[jupyter]"` (nbformat)
+  - Entry point: `skill-seekers-jupyter`
+
+- **Local HTML** — `skill-seekers html --html-path file.html` or `skill-seekers create file.html`
+  - Parses HTML using BeautifulSoup with smart main content detection (`<article>`, `<main>`, `.content`, largest div)
+  - Extracts headings, code blocks, tables (to markdown), images, links; converts inline HTML to markdown
+  - Handles single files and directories; supports `.html`, `.htm`, `.xhtml` extensions
+  - No extra dependencies (BeautifulSoup is a core dep)
+
+- **OpenAPI/Swagger** — `skill-seekers openapi --spec spec.yaml` or `skill-seekers create spec.yaml`
+  - Parses OpenAPI 3.0/3.1 and Swagger 2.0 specs from YAML or JSON (local files or URLs via `--spec-url`)
+  - Extracts endpoints, parameters, request/response schemas, security schemes, tags
+  - Resolves `$ref` references with circular reference protection; handles `allOf`/`oneOf`/`anyOf`
+  - Groups endpoints by tags; generates comprehensive API reference markdown
+  - Source detection sniffs YAML file content for `openapi:` or `swagger:` keys (avoids false positives on non-API YAML files)
+  - Optional dependency: `pip install "skill-seekers[openapi]"` (pyyaml — already a core dep, guard added for safety)
+
+- **AsciiDoc** — `skill-seekers asciidoc --asciidoc-path file.adoc` or `skill-seekers create file.adoc`
+  - Regex-based parser (no external library required) with optional `asciidoc` library support
+  - Extracts headings (= through =====), `[source,lang]` code blocks, `|===` tables, admonitions (NOTE/TIP/WARNING/IMPORTANT/CAUTION), and `include::` directives
+  - Converts AsciiDoc formatting to markdown; handles single files and directories
+  - Optional dependency: `pip install "skill-seekers[asciidoc]"` (asciidoc library for advanced rendering)
+
+- **PowerPoint (.pptx)** — `skill-seekers pptx --pptx file.pptx` or `skill-seekers create file.pptx`
+  - Extracts slide text, speaker notes, tables, images (with alt text), and grouped shapes
+  - Detects code blocks by monospace font analysis (30+ font families)
+  - Groups slides into sections by layout type; handles single files and directories
+  - Optional dependency: `pip install "skill-seekers[pptx]"` (python-pptx)
+
+- **RSS/Atom Feeds** — `skill-seekers rss --feed-url <url>` / `--feed-path file.rss` or `skill-seekers create feed.rss`
+  - Parses RSS 2.0, RSS 1.0, and Atom feeds via feedparser
+  - Optionally follows article links (`--follow-links`, default on) to scrape full page content using BeautifulSoup
+  - Extracts article titles, summaries, authors, dates, categories; configurable `--max-articles` (default 50)
+  - Source detection matches `.rss` and `.atom` extensions (`.xml` excluded to avoid false positives)
+  - Optional dependency: `pip install "skill-seekers[rss]"` (feedparser)
+
+- **Man Pages** — `skill-seekers manpage --man-names git,curl` / `--man-path dir/` or `skill-seekers create git.1`
+  - Extracts man pages by running `man` command via subprocess or reading `.1`–`.8`/`.man` files directly
+  - Handles gzip/bzip2/xz compressed man files; strips troff/groff formatting (backspace overstriking, macros, font escapes)
+  - Parses structured sections (NAME, SYNOPSIS, DESCRIPTION, OPTIONS, EXAMPLES, SEE ALSO)
+  - Source detection uses basename heuristic to avoid false positives on log rotation files (e.g., `access.log.1`)
+  - No external dependencies (stdlib only)
+
+- **Confluence** — `skill-seekers confluence --base-url <url> --space-key <key>` or `--export-path dir/`
+  - API mode: fetches pages from Confluence REST API with pagination (`atlassian-python-api`)
+  - Export mode: parses Confluence HTML/XML export directories
+  - Extracts page content, code/panel/info/warning macros, page hierarchy, tables
+  - Optional dependency: `pip install "skill-seekers[confluence]"` (atlassian-python-api)
+
+- **Notion** — `skill-seekers notion --database-id <id>` / `--page-id <id>` or `--export-path dir/`
+  - API mode: fetches pages via Notion API with support for 20+ block types (paragraph, heading, code, callout, toggle, table, etc.)
+  - Export mode: parses Notion Markdown/CSV export directories
+  - Extracts rich text with annotations (bold, italic, code, links), 16+ property types for database entries
+  - Optional dependency: `pip install "skill-seekers[notion]"` (notion-client)
+
+- **Slack/Discord Chat** — `skill-seekers chat --export-path dir/` or `--token <token> --channel <channel>`
+  - Slack: parses workspace JSON exports or fetches via Slack Web API (`slack_sdk`)
+  - Discord: parses DiscordChatExporter JSON or fetches via Discord HTTP API
+  - Extracts messages, code snippets (fenced blocks), shared URLs, threads, reactions, attachments
+  - Generates per-channel summaries and topic categorization
+  - Optional dependency: `pip install "skill-seekers[chat]"` (slack-sdk)
+
+#### EPUB Unified Pipeline Integration
 - **EPUB (.epub) input support** via `skill-seekers create book.epub` or `skill-seekers epub --epub book.epub`
  - Extracts chapters, metadata (Dublin Core), code blocks, images, and tables from EPUB 2 and EPUB 3 files
  - DRM detection with clear error messages (Adobe ADEPT, Apple FairPlay, Readium LCP)
@@ -16,6 +87,61 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
  - `--help-epub` flag for EPUB-specific help
  - Optional dependency: `pip install "skill-seekers[epub]"` (ebooklib)
  - 107 tests across 14 test classes
+- **EPUB added to unified scraper** — `_scrape_epub()` method, `scraped_data["epub"]`, config validation (`_validate_epub_source`), and dry-run display. Previously EPUB worked standalone but was missing from multi-source configs.
+
+#### Unified Skill Builder — Generic Merge System
+- **`_generic_merge()`** — Priority-based section merge for any combination of source types not covered by existing pairwise synthesis (docs+github, docs+pdf, etc.). Produces YAML frontmatter + source-attributed sections.
+- **`_append_extra_sources()`** — Appends additional source type content (e.g., Jupyter + PPTX) to pairwise-synthesized SKILL.md.
+- **`_generate_generic_references()`** — Generates `references/<type>/index.md` for any source type, with ID resolution fallback chain.
+- **`_SOURCE_LABELS`** dict — Human-readable labels for all 17 source types used in merge attribution.
+
+#### Config Validator Expansion
+- **17 source types in `VALID_SOURCE_TYPES`** — All new types plus `word` and `video` now have per-type validation methods.
+- **`_validate_word_source()`** — Validates `path` field for Word documents (was previously missing).
+- **`_validate_video_source()`** — Validates `url`, `path`, or `playlist` field for video sources (was previously missing).
+- **11 new `_validate_*_source()` methods** — One for each new type with appropriate required-field checks.
+
+#### Source Detection Improvements
+- **7 new file extension detections** in `SourceDetector.detect()` — `.ipynb`, `.html`/`.htm`, `.pptx`, `.adoc`/`.asciidoc`, `.rss`/`.atom`, `.1`–`.8`/`.man`, `.yaml`/`.yml` (with content sniffing)
+- **`_looks_like_openapi()`** — Content sniffing for YAML files: only classifies as OpenAPI if the file contains `openapi:` or `swagger:` key in first 20 lines (prevents false positives on docker-compose, Ansible, Kubernetes manifests, etc.)
+- **Man page basename heuristic** — `.1`–`.8` extensions only detected as man pages if the basename has no dots (e.g., `git.1` matches but `access.log.1` does not)
+- **`.xml` excluded from RSS detection** — Too generic; only `.rss` and `.atom` trigger RSS detection
+
+#### MCP Server Integration
+- **`scrape_generic` tool** — New MCP tool handles all 10 new source types via subprocess with per-type flag mapping
+- **`_PATH_FLAGS` / `_URL_FLAGS` dicts** — Correct flag routing for each source type (e.g., jupyter→`--notebook`, html→`--html-path`, rss→`--feed-url`)
+- **`GENERIC_SOURCE_TYPES` tuple** — Lists all 10 new types for validation
+- **Config validation display** — `validate_config` tool now shows source details for all new types
+- **Tool count updated** — 33 → 34 tools (scraping tools 10 → 11)
+
+#### CLI Wiring
+- **10 new CLI subcommands** — `jupyter`, `html`, `openapi`, `asciidoc`, `pptx`, `rss`, `manpage`, `confluence`, `notion`, `chat` in `COMMAND_MODULES`
+- **10 new argument modules** — `arguments/{jupyter,html,openapi,asciidoc,pptx,rss,manpage,confluence,notion,chat}.py` with per-type `*_ARGUMENTS` dicts
+- **10 new parser modules** — `parsers/{jupyter,html,openapi,asciidoc,pptx,rss,manpage,confluence,notion,chat}_parser.py` with `SubcommandParser` implementations
+- **`create` command routing** — `_route_generic()` method for all new types with correct module names and CLI flags
+- **10 new entry points** in pyproject.toml — `skill-seekers-{jupyter,html,openapi,asciidoc,pptx,rss,manpage,confluence,notion,chat}`
+- **7 new optional dependency groups** in pyproject.toml — `[jupyter]`, `[asciidoc]`, `[pptx]`, `[confluence]`, `[notion]`, `[rss]`, `[chat]`
+- **`[all]` group updated** — Includes all 7 new optional dependencies
+
+#### Workflow & Documentation
+- **`complex-merge.yaml`** — New 7-stage AI-powered workflow for complex multi-source merging (source inventory → cross-reference → conflict detection → priority merge → gap analysis → synthesis → quality check)
+- **AGENTS.md rewritten** — Updated with all 17 source types, scraper pattern docs, project layout, and key pattern documentation
+- **77 new integration tests** in `test_new_source_types.py` — Source detection, config validation, generic merge, CLI wiring, validation, and create command routing
+
+### Fixed
+- **Config validator missing `word` and `video` dispatch** — `_validate_source()` had no `elif` branches for `word` or `video` types, silently skipping validation. Added dispatch entries and `_validate_word_source()` / `_validate_video_source()` methods.
+- **`openapi_scraper.py` unconditional `import yaml`** — Would crash at import time if pyyaml not installed. Added `try/except ImportError` guard with `YAML_AVAILABLE` flag and `_check_yaml_deps()` helper.
+- **`asciidoc_scraper.py` missing standard arguments** — `main()` manually defined args instead of using `add_asciidoc_arguments()`. Refactored to use shared argument definitions + added enhancement workflow integration.
+- **`pptx_scraper.py` missing standard arguments** — Same issue. Refactored to use `add_pptx_arguments()`.
+- **`chat_scraper.py` missing standard arguments** — Same issue. Refactored to use `add_chat_arguments()`.
+- **`notion_scraper.py` missing `run_workflows` call** — `--enhance-workflow` flags were silently ignored. Added workflow runner integration.
+- **`openapi_scraper.py` return type `None`** — `main()` returned `None` instead of `int`. Fixed to `return 0` on success, matching all other scrapers.
+- **MCP `scrape_generic_tool` flag mismatch** — Was passing `--path`/`--url` as generic flags, but every scraper expects its own flag name (e.g., `--notebook`, `--html-path`, `--spec`). All 10 source types would have failed at runtime. Fixed with per-type `_PATH_FLAGS` and `_URL_FLAGS` mappings.
+- **Word scraper `docx_id` key mismatch** — Unified scraper data dict used `docx_id` but generic reference generation looked for `word_id`. Added `word_id` alias.
+- **`main.py` docstring stale** — Missing all 10 new commands. Updated to list all 27 commands.
+- **`source_detector.py` module docstring stale** — Described only 5 source types. Updated to describe 14+ detected types.
+- **`manpage_parser.py` docstring referenced wrong file** — Said `manpage_scraper.py` but actual file is `man_scraper.py`. Fixed.
+- **Parser registry test count** — Updated expected count from 25 to 35 for 10 new parsers.

 ## [3.2.0] - 2026-03-01

--- a/pyproject.toml
+++ b/pyproject.toml
@@ -168,6 +168,35 @@ all-cloud = [
    "azure-storage-blob>=12.19.0",
 ]

+# New source type dependencies (v3.2.0+)
+jupyter = [
+    "nbformat>=5.9.0",
+]
+
+asciidoc = [
+    "asciidoc>=10.0.0",
+]
+
+pptx = [
+    "python-pptx>=0.6.21",
+]
+
+confluence = [
+    "atlassian-python-api>=3.41.0",
+]
+
+notion = [
+    "notion-client>=2.0.0",
+]
+
+rss = [
+    "feedparser>=6.0.0",
+]
+
+chat = [
+    "slack-sdk>=3.27.0",
+]
+
 # Embedding server support
 embedding = [
    "fastapi>=0.109.0",
@@ -204,6 +233,14 @@ all = [
    "sentence-transformers>=2.3.0",
    "numpy>=1.24.0",
    "voyageai>=0.2.0",
+    # New source types (v3.2.0+)
+    "nbformat>=5.9.0",
+    "asciidoc>=10.0.0",
+    "python-pptx>=0.6.21",
+    "atlassian-python-api>=3.41.0",
+    "notion-client>=2.0.0",
+    "feedparser>=6.0.0",
+    "slack-sdk>=3.27.0",
 ]

 [project.urls]
@@ -253,6 +290,18 @@ skill-seekers-quality = "skill_seekers.cli.quality_metrics:main"
 skill-seekers-workflows = "skill_seekers.cli.workflows_command:main"
 skill-seekers-sync-config = "skill_seekers.cli.sync_config:main"

+# New source type entry points (v3.2.0+)
+skill-seekers-jupyter = "skill_seekers.cli.jupyter_scraper:main"
+skill-seekers-html = "skill_seekers.cli.html_scraper:main"
+skill-seekers-openapi = "skill_seekers.cli.openapi_scraper:main"
+skill-seekers-asciidoc = "skill_seekers.cli.asciidoc_scraper:main"
+skill-seekers-pptx = "skill_seekers.cli.pptx_scraper:main"
+skill-seekers-rss = "skill_seekers.cli.rss_scraper:main"
+skill-seekers-manpage = "skill_seekers.cli.man_scraper:main"
+skill-seekers-confluence = "skill_seekers.cli.confluence_scraper:main"
+skill-seekers-notion = "skill_seekers.cli.notion_scraper:main"
+skill-seekers-chat = "skill_seekers.cli.chat_scraper:main"
+
 [tool.setuptools]
 package-dir = {"" = "src"}

--- a/src/skill_seekers/cli/arguments/asciidoc.py
+++ b/src/skill_seekers/cli/arguments/asciidoc.py
@@ -0,0 +1,68 @@
+"""AsciiDoc command argument definitions.
+
+This module defines ALL arguments for the asciidoc command in ONE place.
+Both asciidoc_scraper.py (standalone) and parsers/asciidoc_parser.py (unified CLI)
+import and use these definitions.
+
+Shared arguments (name, description, output, enhance-level, api-key,
+dry-run, verbose, quiet, workflow args) come from common.py / workflow.py
+via ``add_all_standard_arguments()``.
+"""
+
+import argparse
+from typing import Any
+
+from .common import add_all_standard_arguments
+
+# AsciiDoc-specific argument definitions as data structure
+# NOTE: Shared args (name, description, output, enhance_level, api_key, dry_run,
+#       verbose, quiet, workflow args) are registered by add_all_standard_arguments().
+ASCIIDOC_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "asciidoc_path": {
+        "flags": ("--asciidoc-path",),
+        "kwargs": {
+            "type": str,
+            "help": "Path to AsciiDoc file or directory containing .adoc files",
+            "metavar": "PATH",
+        },
+    },
+    "from_json": {
+        "flags": ("--from-json",),
+        "kwargs": {
+            "type": str,
+            "help": "Build skill from extracted JSON",
+            "metavar": "FILE",
+        },
+    },
+}
+
+
+def add_asciidoc_arguments(parser: argparse.ArgumentParser) -> None:
+    """Add all asciidoc command arguments to a parser.
+
+    Registers shared args (name, description, output, enhance-level, api-key,
+    dry-run, verbose, quiet, workflow args) via add_all_standard_arguments(),
+    then adds AsciiDoc-specific args on top.
+
+    The default for --enhance-level is overridden to 0 (disabled) for AsciiDoc.
+    """
+    # Shared universal args first
+    add_all_standard_arguments(parser)
+
+    # Override enhance-level default to 0 for AsciiDoc
+    for action in parser._actions:
+        if hasattr(action, "dest") and action.dest == "enhance_level":
+            action.default = 0
+            action.help = (
+                "AI enhancement level (auto-detects API vs LOCAL mode): "
+                "0=disabled (default for AsciiDoc), 1=SKILL.md only, "
+                "2=+architecture/config, 3=full enhancement. "
+                "Mode selection: uses API if ANTHROPIC_API_KEY is set, "
+                "otherwise LOCAL (Claude Code)"
+            )
+
+    # AsciiDoc-specific args
+    for arg_name, arg_def in ASCIIDOC_ARGUMENTS.items():
+        flags = arg_def["flags"]
+        kwargs = arg_def["kwargs"]
+        parser.add_argument(*flags, **kwargs)
--- a/src/skill_seekers/cli/arguments/chat.py
+++ b/src/skill_seekers/cli/arguments/chat.py
@@ -0,0 +1,102 @@
+"""Chat command argument definitions.
+
+This module defines ALL arguments for the chat command in ONE place.
+Both chat_scraper.py (standalone) and parsers/chat_parser.py (unified CLI)
+import and use these definitions.
+
+Shared arguments (name, description, output, enhance-level, api-key,
+dry-run, verbose, quiet, workflow args) come from common.py / workflow.py
+via ``add_all_standard_arguments()``.
+"""
+
+import argparse
+from typing import Any
+
+from .common import add_all_standard_arguments
+
+# Chat-specific argument definitions as data structure
+# NOTE: Shared args (name, description, output, enhance_level, api_key, dry_run,
+#       verbose, quiet, workflow args) are registered by add_all_standard_arguments().
+CHAT_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "export_path": {
+        "flags": ("--export-path",),
+        "kwargs": {
+            "type": str,
+            "help": "Path to chat export directory or file",
+            "metavar": "PATH",
+        },
+    },
+    "platform": {
+        "flags": ("--platform",),
+        "kwargs": {
+            "type": str,
+            "choices": ["slack", "discord"],
+            "default": "slack",
+            "help": "Chat platform type (default: slack)",
+        },
+    },
+    "token": {
+        "flags": ("--token",),
+        "kwargs": {
+            "type": str,
+            "help": "API token for chat platform authentication",
+            "metavar": "TOKEN",
+        },
+    },
+    "channel": {
+        "flags": ("--channel",),
+        "kwargs": {
+            "type": str,
+            "help": "Channel name or ID to extract from",
+            "metavar": "CHANNEL",
+        },
+    },
+    "max_messages": {
+        "flags": ("--max-messages",),
+        "kwargs": {
+            "type": int,
+            "default": 10000,
+            "help": "Maximum number of messages to extract (default: 10000)",
+            "metavar": "N",
+        },
+    },
+    "from_json": {
+        "flags": ("--from-json",),
+        "kwargs": {
+            "type": str,
+            "help": "Build skill from extracted JSON",
+            "metavar": "FILE",
+        },
+    },
+}
+
+
+def add_chat_arguments(parser: argparse.ArgumentParser) -> None:
+    """Add all chat command arguments to a parser.
+
+    Registers shared args (name, description, output, enhance-level, api-key,
+    dry-run, verbose, quiet, workflow args) via add_all_standard_arguments(),
+    then adds Chat-specific args on top.
+
+    The default for --enhance-level is overridden to 0 (disabled) for Chat.
+    """
+    # Shared universal args first
+    add_all_standard_arguments(parser)
+
+    # Override enhance-level default to 0 for Chat
+    for action in parser._actions:
+        if hasattr(action, "dest") and action.dest == "enhance_level":
+            action.default = 0
+            action.help = (
+                "AI enhancement level (auto-detects API vs LOCAL mode): "
+                "0=disabled (default for Chat), 1=SKILL.md only, "
+                "2=+architecture/config, 3=full enhancement. "
+                "Mode selection: uses API if ANTHROPIC_API_KEY is set, "
+                "otherwise LOCAL (Claude Code)"
+            )
+
+    # Chat-specific args
+    for arg_name, arg_def in CHAT_ARGUMENTS.items():
+        flags = arg_def["flags"]
+        kwargs = arg_def["kwargs"]
+        parser.add_argument(*flags, **kwargs)
--- a/src/skill_seekers/cli/arguments/confluence.py
+++ b/src/skill_seekers/cli/arguments/confluence.py
@@ -0,0 +1,109 @@
+"""Confluence command argument definitions.
+
+This module defines ALL arguments for the confluence command in ONE place.
+Both confluence_scraper.py (standalone) and parsers/confluence_parser.py (unified CLI)
+import and use these definitions.
+
+Shared arguments (name, description, output, enhance-level, api-key,
+dry-run, verbose, quiet, workflow args) come from common.py / workflow.py
+via ``add_all_standard_arguments()``.
+"""
+
+import argparse
+from typing import Any
+
+from .common import add_all_standard_arguments
+
+# Confluence-specific argument definitions as data structure
+# NOTE: Shared args (name, description, output, enhance_level, api_key, dry_run,
+#       verbose, quiet, workflow args) are registered by add_all_standard_arguments().
+CONFLUENCE_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "base_url": {
+        "flags": ("--base-url",),
+        "kwargs": {
+            "type": str,
+            "help": "Confluence instance base URL",
+            "metavar": "URL",
+        },
+    },
+    "space_key": {
+        "flags": ("--space-key",),
+        "kwargs": {
+            "type": str,
+            "help": "Confluence space key to extract from",
+            "metavar": "KEY",
+        },
+    },
+    "export_path": {
+        "flags": ("--export-path",),
+        "kwargs": {
+            "type": str,
+            "help": "Path to Confluence HTML/XML export directory",
+            "metavar": "PATH",
+        },
+    },
+    "username": {
+        "flags": ("--username",),
+        "kwargs": {
+            "type": str,
+            "help": "Confluence username for API authentication",
+            "metavar": "USER",
+        },
+    },
+    "token": {
+        "flags": ("--token",),
+        "kwargs": {
+            "type": str,
+            "help": "Confluence API token for authentication",
+            "metavar": "TOKEN",
+        },
+    },
+    "max_pages": {
+        "flags": ("--max-pages",),
+        "kwargs": {
+            "type": int,
+            "default": 500,
+            "help": "Maximum number of pages to extract (default: 500)",
+            "metavar": "N",
+        },
+    },
+    "from_json": {
+        "flags": ("--from-json",),
+        "kwargs": {
+            "type": str,
+            "help": "Build skill from extracted JSON",
+            "metavar": "FILE",
+        },
+    },
+}
+
+
+def add_confluence_arguments(parser: argparse.ArgumentParser) -> None:
+    """Add all confluence command arguments to a parser.
+
+    Registers shared args (name, description, output, enhance-level, api-key,
+    dry-run, verbose, quiet, workflow args) via add_all_standard_arguments(),
+    then adds Confluence-specific args on top.
+
+    The default for --enhance-level is overridden to 0 (disabled) for Confluence.
+    """
+    # Shared universal args first
+    add_all_standard_arguments(parser)
+
+    # Override enhance-level default to 0 for Confluence
+    for action in parser._actions:
+        if hasattr(action, "dest") and action.dest == "enhance_level":
+            action.default = 0
+            action.help = (
+                "AI enhancement level (auto-detects API vs LOCAL mode): "
+                "0=disabled (default for Confluence), 1=SKILL.md only, "
+                "2=+architecture/config, 3=full enhancement. "
+                "Mode selection: uses API if ANTHROPIC_API_KEY is set, "
+                "otherwise LOCAL (Claude Code)"
+            )
+
+    # Confluence-specific args
+    for arg_name, arg_def in CONFLUENCE_ARGUMENTS.items():
+        flags = arg_def["flags"]
+        kwargs = arg_def["kwargs"]
+        parser.add_argument(*flags, **kwargs)
--- a/src/skill_seekers/cli/arguments/create.py
+++ b/src/skill_seekers/cli/arguments/create.py
@@ -549,6 +549,121 @@ CONFIG_ARGUMENTS: dict[str, dict[str, Any]] = {
    # For unified config files, use `skill-seekers unified --fresh` directly.
 }

+# New source type arguments (v3.2.0+)
+# These are minimal dicts since most flags are handled by each scraper's own argument module.
+# The create command only needs the primary input flag for routing.
+
+JUPYTER_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "notebook": {
+        "flags": ("--notebook",),
+        "kwargs": {"type": str, "help": "Jupyter Notebook file path (.ipynb)", "metavar": "PATH"},
+    },
+}
+
+HTML_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "html_path": {
+        "flags": ("--html-path",),
+        "kwargs": {"type": str, "help": "Local HTML file or directory path", "metavar": "PATH"},
+    },
+}
+
+OPENAPI_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "spec": {
+        "flags": ("--spec",),
+        "kwargs": {"type": str, "help": "OpenAPI/Swagger spec file path", "metavar": "PATH"},
+    },
+    "spec_url": {
+        "flags": ("--spec-url",),
+        "kwargs": {"type": str, "help": "OpenAPI/Swagger spec URL", "metavar": "URL"},
+    },
+}
+
+ASCIIDOC_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "asciidoc_path": {
+        "flags": ("--asciidoc-path",),
+        "kwargs": {"type": str, "help": "AsciiDoc file or directory path", "metavar": "PATH"},
+    },
+}
+
+PPTX_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "pptx": {
+        "flags": ("--pptx",),
+        "kwargs": {"type": str, "help": "PowerPoint file path (.pptx)", "metavar": "PATH"},
+    },
+}
+
+RSS_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "feed_url": {
+        "flags": ("--feed-url",),
+        "kwargs": {"type": str, "help": "RSS/Atom feed URL", "metavar": "URL"},
+    },
+    "feed_path": {
+        "flags": ("--feed-path",),
+        "kwargs": {"type": str, "help": "RSS/Atom feed file path", "metavar": "PATH"},
+    },
+}
+
+MANPAGE_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "man_names": {
+        "flags": ("--man-names",),
+        "kwargs": {
+            "type": str,
+            "help": "Comma-separated man page names (e.g., 'git,curl')",
+            "metavar": "NAMES",
+        },
+    },
+    "man_path": {
+        "flags": ("--man-path",),
+        "kwargs": {"type": str, "help": "Directory of man page files", "metavar": "PATH"},
+    },
+}
+
+CONFLUENCE_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "conf_base_url": {
+        "flags": ("--conf-base-url",),
+        "kwargs": {"type": str, "help": "Confluence base URL", "metavar": "URL"},
+    },
+    "space_key": {
+        "flags": ("--space-key",),
+        "kwargs": {"type": str, "help": "Confluence space key", "metavar": "KEY"},
+    },
+    "conf_export_path": {
+        "flags": ("--conf-export-path",),
+        "kwargs": {"type": str, "help": "Confluence export directory", "metavar": "PATH"},
+    },
+}
+
+NOTION_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "database_id": {
+        "flags": ("--database-id",),
+        "kwargs": {"type": str, "help": "Notion database ID", "metavar": "ID"},
+    },
+    "page_id": {
+        "flags": ("--page-id",),
+        "kwargs": {"type": str, "help": "Notion page ID", "metavar": "ID"},
+    },
+    "notion_export_path": {
+        "flags": ("--notion-export-path",),
+        "kwargs": {"type": str, "help": "Notion export directory", "metavar": "PATH"},
+    },
+}
+
+CHAT_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "chat_export_path": {
+        "flags": ("--chat-export-path",),
+        "kwargs": {"type": str, "help": "Slack/Discord export directory", "metavar": "PATH"},
+    },
+    "platform": {
+        "flags": ("--platform",),
+        "kwargs": {
+            "type": str,
+            "choices": ["slack", "discord"],
+            "default": "slack",
+            "help": "Chat platform (default: slack)",
+        },
+    },
+}
+
 # =============================================================================
 # TIER 3: ADVANCED/RARE ARGUMENTS
 # =============================================================================
@@ -613,6 +728,17 @@ def get_source_specific_arguments(source_type: str) -> dict[str, dict[str, Any]]
        "epub": EPUB_ARGUMENTS,
        "video": VIDEO_ARGUMENTS,
        "config": CONFIG_ARGUMENTS,
+        # New source types (v3.2.0+)
+        "jupyter": JUPYTER_ARGUMENTS,
+        "html": HTML_ARGUMENTS,
+        "openapi": OPENAPI_ARGUMENTS,
+        "asciidoc": ASCIIDOC_ARGUMENTS,
+        "pptx": PPTX_ARGUMENTS,
+        "rss": RSS_ARGUMENTS,
+        "manpage": MANPAGE_ARGUMENTS,
+        "confluence": CONFLUENCE_ARGUMENTS,
+        "notion": NOTION_ARGUMENTS,
+        "chat": CHAT_ARGUMENTS,
    }
    return source_args.get(source_type, {})

@@ -703,6 +829,24 @@ def add_create_arguments(parser: argparse.ArgumentParser, mode: str = "default")
        for arg_name, arg_def in CONFIG_ARGUMENTS.items():
            parser.add_argument(*arg_def["flags"], **arg_def["kwargs"])

+    # New source types (v3.2.0+)
+    _NEW_SOURCE_ARGS = {
+        "jupyter": JUPYTER_ARGUMENTS,
+        "html": HTML_ARGUMENTS,
+        "openapi": OPENAPI_ARGUMENTS,
+        "asciidoc": ASCIIDOC_ARGUMENTS,
+        "pptx": PPTX_ARGUMENTS,
+        "rss": RSS_ARGUMENTS,
+        "manpage": MANPAGE_ARGUMENTS,
+        "confluence": CONFLUENCE_ARGUMENTS,
+        "notion": NOTION_ARGUMENTS,
+        "chat": CHAT_ARGUMENTS,
+    }
+    for stype, sargs in _NEW_SOURCE_ARGS.items():
+        if mode in [stype, "all"]:
+            for arg_name, arg_def in sargs.items():
+                parser.add_argument(*arg_def["flags"], **arg_def["kwargs"])
+
    # Add advanced arguments if requested
    if mode in ["advanced", "all"]:
        for arg_name, arg_def in ADVANCED_ARGUMENTS.items():
--- a/src/skill_seekers/cli/arguments/html.py
+++ b/src/skill_seekers/cli/arguments/html.py
@@ -0,0 +1,68 @@
+"""HTML command argument definitions.
+
+This module defines ALL arguments for the html command in ONE place.
+Both html_scraper.py (standalone) and parsers/html_parser.py (unified CLI)
+import and use these definitions.
+
+Shared arguments (name, description, output, enhance-level, api-key,
+dry-run, verbose, quiet, workflow args) come from common.py / workflow.py
+via ``add_all_standard_arguments()``.
+"""
+
+import argparse
+from typing import Any
+
+from .common import add_all_standard_arguments
+
+# HTML-specific argument definitions as data structure
+# NOTE: Shared args (name, description, output, enhance_level, api_key, dry_run,
+#       verbose, quiet, workflow args) are registered by add_all_standard_arguments().
+HTML_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "html_path": {
+        "flags": ("--html-path",),
+        "kwargs": {
+            "type": str,
+            "help": "Path to HTML file or directory containing HTML files",
+            "metavar": "PATH",
+        },
+    },
+    "from_json": {
+        "flags": ("--from-json",),
+        "kwargs": {
+            "type": str,
+            "help": "Build skill from extracted JSON",
+            "metavar": "FILE",
+        },
+    },
+}
+
+
+def add_html_arguments(parser: argparse.ArgumentParser) -> None:
+    """Add all html command arguments to a parser.
+
+    Registers shared args (name, description, output, enhance-level, api-key,
+    dry-run, verbose, quiet, workflow args) via add_all_standard_arguments(),
+    then adds HTML-specific args on top.
+
+    The default for --enhance-level is overridden to 0 (disabled) for HTML.
+    """
+    # Shared universal args first
+    add_all_standard_arguments(parser)
+
+    # Override enhance-level default to 0 for HTML
+    for action in parser._actions:
+        if hasattr(action, "dest") and action.dest == "enhance_level":
+            action.default = 0
+            action.help = (
+                "AI enhancement level (auto-detects API vs LOCAL mode): "
+                "0=disabled (default for HTML), 1=SKILL.md only, "
+                "2=+architecture/config, 3=full enhancement. "
+                "Mode selection: uses API if ANTHROPIC_API_KEY is set, "
+                "otherwise LOCAL (Claude Code)"
+            )
+
+    # HTML-specific args
+    for arg_name, arg_def in HTML_ARGUMENTS.items():
+        flags = arg_def["flags"]
+        kwargs = arg_def["kwargs"]
+        parser.add_argument(*flags, **kwargs)
--- a/src/skill_seekers/cli/arguments/jupyter.py
+++ b/src/skill_seekers/cli/arguments/jupyter.py
@@ -0,0 +1,68 @@
+"""Jupyter Notebook command argument definitions.
+
+This module defines ALL arguments for the jupyter command in ONE place.
+Both jupyter_scraper.py (standalone) and parsers/jupyter_parser.py (unified CLI)
+import and use these definitions.
+
+Shared arguments (name, description, output, enhance-level, api-key,
+dry-run, verbose, quiet, workflow args) come from common.py / workflow.py
+via ``add_all_standard_arguments()``.
+"""
+
+import argparse
+from typing import Any
+
+from .common import add_all_standard_arguments
+
+# Jupyter-specific argument definitions as data structure
+# NOTE: Shared args (name, description, output, enhance_level, api_key, dry_run,
+#       verbose, quiet, workflow args) are registered by add_all_standard_arguments().
+JUPYTER_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "notebook": {
+        "flags": ("--notebook",),
+        "kwargs": {
+            "type": str,
+            "help": "Path to .ipynb file or directory containing notebooks",
+            "metavar": "PATH",
+        },
+    },
+    "from_json": {
+        "flags": ("--from-json",),
+        "kwargs": {
+            "type": str,
+            "help": "Build skill from extracted JSON",
+            "metavar": "FILE",
+        },
+    },
+}
+
+
+def add_jupyter_arguments(parser: argparse.ArgumentParser) -> None:
+    """Add all jupyter command arguments to a parser.
+
+    Registers shared args (name, description, output, enhance-level, api-key,
+    dry-run, verbose, quiet, workflow args) via add_all_standard_arguments(),
+    then adds Jupyter-specific args on top.
+
+    The default for --enhance-level is overridden to 0 (disabled) for Jupyter.
+    """
+    # Shared universal args first
+    add_all_standard_arguments(parser)
+
+    # Override enhance-level default to 0 for Jupyter
+    for action in parser._actions:
+        if hasattr(action, "dest") and action.dest == "enhance_level":
+            action.default = 0
+            action.help = (
+                "AI enhancement level (auto-detects API vs LOCAL mode): "
+                "0=disabled (default for Jupyter), 1=SKILL.md only, "
+                "2=+architecture/config, 3=full enhancement. "
+                "Mode selection: uses API if ANTHROPIC_API_KEY is set, "
+                "otherwise LOCAL (Claude Code)"
+            )
+
+    # Jupyter-specific args
+    for arg_name, arg_def in JUPYTER_ARGUMENTS.items():
+        flags = arg_def["flags"]
+        kwargs = arg_def["kwargs"]
+        parser.add_argument(*flags, **kwargs)
--- a/src/skill_seekers/cli/arguments/manpage.py
+++ b/src/skill_seekers/cli/arguments/manpage.py
@@ -0,0 +1,84 @@
+"""Man page command argument definitions.
+
+This module defines ALL arguments for the manpage command in ONE place.
+Both manpage_scraper.py (standalone) and parsers/manpage_parser.py (unified CLI)
+import and use these definitions.
+
+Shared arguments (name, description, output, enhance-level, api-key,
+dry-run, verbose, quiet, workflow args) come from common.py / workflow.py
+via ``add_all_standard_arguments()``.
+"""
+
+import argparse
+from typing import Any
+
+from .common import add_all_standard_arguments
+
+# ManPage-specific argument definitions as data structure
+# NOTE: Shared args (name, description, output, enhance_level, api_key, dry_run,
+#       verbose, quiet, workflow args) are registered by add_all_standard_arguments().
+MANPAGE_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "man_names": {
+        "flags": ("--man-names",),
+        "kwargs": {
+            "type": str,
+            "help": "Comma-separated list of man page names (e.g., 'ls,grep,find')",
+            "metavar": "NAMES",
+        },
+    },
+    "man_path": {
+        "flags": ("--man-path",),
+        "kwargs": {
+            "type": str,
+            "help": "Path to directory containing man page files",
+            "metavar": "PATH",
+        },
+    },
+    "sections": {
+        "flags": ("--sections",),
+        "kwargs": {
+            "type": str,
+            "help": "Comma-separated section numbers to include (e.g., '1,3,8')",
+            "metavar": "SECTIONS",
+        },
+    },
+    "from_json": {
+        "flags": ("--from-json",),
+        "kwargs": {
+            "type": str,
+            "help": "Build skill from extracted JSON",
+            "metavar": "FILE",
+        },
+    },
+}
+
+
+def add_manpage_arguments(parser: argparse.ArgumentParser) -> None:
+    """Add all manpage command arguments to a parser.
+
+    Registers shared args (name, description, output, enhance-level, api-key,
+    dry-run, verbose, quiet, workflow args) via add_all_standard_arguments(),
+    then adds ManPage-specific args on top.
+
+    The default for --enhance-level is overridden to 0 (disabled) for ManPage.
+    """
+    # Shared universal args first
+    add_all_standard_arguments(parser)
+
+    # Override enhance-level default to 0 for ManPage
+    for action in parser._actions:
+        if hasattr(action, "dest") and action.dest == "enhance_level":
+            action.default = 0
+            action.help = (
+                "AI enhancement level (auto-detects API vs LOCAL mode): "
+                "0=disabled (default for ManPage), 1=SKILL.md only, "
+                "2=+architecture/config, 3=full enhancement. "
+                "Mode selection: uses API if ANTHROPIC_API_KEY is set, "
+                "otherwise LOCAL (Claude Code)"
+            )
+
+    # ManPage-specific args
+    for arg_name, arg_def in MANPAGE_ARGUMENTS.items():
+        flags = arg_def["flags"]
+        kwargs = arg_def["kwargs"]
+        parser.add_argument(*flags, **kwargs)
--- a/src/skill_seekers/cli/arguments/notion.py
+++ b/src/skill_seekers/cli/arguments/notion.py
@@ -0,0 +1,101 @@
+"""Notion command argument definitions.
+
+This module defines ALL arguments for the notion command in ONE place.
+Both notion_scraper.py (standalone) and parsers/notion_parser.py (unified CLI)
+import and use these definitions.
+
+Shared arguments (name, description, output, enhance-level, api-key,
+dry-run, verbose, quiet, workflow args) come from common.py / workflow.py
+via ``add_all_standard_arguments()``.
+"""
+
+import argparse
+from typing import Any
+
+from .common import add_all_standard_arguments
+
+# Notion-specific argument definitions as data structure
+# NOTE: Shared args (name, description, output, enhance_level, api_key, dry_run,
+#       verbose, quiet, workflow args) are registered by add_all_standard_arguments().
+NOTION_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "database_id": {
+        "flags": ("--database-id",),
+        "kwargs": {
+            "type": str,
+            "help": "Notion database ID to extract from",
+            "metavar": "ID",
+        },
+    },
+    "page_id": {
+        "flags": ("--page-id",),
+        "kwargs": {
+            "type": str,
+            "help": "Notion page ID to extract from",
+            "metavar": "ID",
+        },
+    },
+    "export_path": {
+        "flags": ("--export-path",),
+        "kwargs": {
+            "type": str,
+            "help": "Path to Notion export directory",
+            "metavar": "PATH",
+        },
+    },
+    "token": {
+        "flags": ("--token",),
+        "kwargs": {
+            "type": str,
+            "help": "Notion integration token for API authentication",
+            "metavar": "TOKEN",
+        },
+    },
+    "max_pages": {
+        "flags": ("--max-pages",),
+        "kwargs": {
+            "type": int,
+            "default": 500,
+            "help": "Maximum number of pages to extract (default: 500)",
+            "metavar": "N",
+        },
+    },
+    "from_json": {
+        "flags": ("--from-json",),
+        "kwargs": {
+            "type": str,
+            "help": "Build skill from extracted JSON",
+            "metavar": "FILE",
+        },
+    },
+}
+
+
+def add_notion_arguments(parser: argparse.ArgumentParser) -> None:
+    """Add all notion command arguments to a parser.
+
+    Registers shared args (name, description, output, enhance-level, api-key,
+    dry-run, verbose, quiet, workflow args) via add_all_standard_arguments(),
+    then adds Notion-specific args on top.
+
+    The default for --enhance-level is overridden to 0 (disabled) for Notion.
+    """
+    # Shared universal args first
+    add_all_standard_arguments(parser)
+
+    # Override enhance-level default to 0 for Notion
+    for action in parser._actions:
+        if hasattr(action, "dest") and action.dest == "enhance_level":
+            action.default = 0
+            action.help = (
+                "AI enhancement level (auto-detects API vs LOCAL mode): "
+                "0=disabled (default for Notion), 1=SKILL.md only, "
+                "2=+architecture/config, 3=full enhancement. "
+                "Mode selection: uses API if ANTHROPIC_API_KEY is set, "
+                "otherwise LOCAL (Claude Code)"
+            )
+
+    # Notion-specific args
+    for arg_name, arg_def in NOTION_ARGUMENTS.items():
+        flags = arg_def["flags"]
+        kwargs = arg_def["kwargs"]
+        parser.add_argument(*flags, **kwargs)
--- a/src/skill_seekers/cli/arguments/openapi.py
+++ b/src/skill_seekers/cli/arguments/openapi.py
@@ -0,0 +1,76 @@
+"""OpenAPI command argument definitions.
+
+This module defines ALL arguments for the openapi command in ONE place.
+Both openapi_scraper.py (standalone) and parsers/openapi_parser.py (unified CLI)
+import and use these definitions.
+
+Shared arguments (name, description, output, enhance-level, api-key,
+dry-run, verbose, quiet, workflow args) come from common.py / workflow.py
+via ``add_all_standard_arguments()``.
+"""
+
+import argparse
+from typing import Any
+
+from .common import add_all_standard_arguments
+
+# OpenAPI-specific argument definitions as data structure
+# NOTE: Shared args (name, description, output, enhance_level, api_key, dry_run,
+#       verbose, quiet, workflow args) are registered by add_all_standard_arguments().
+OPENAPI_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "spec": {
+        "flags": ("--spec",),
+        "kwargs": {
+            "type": str,
+            "help": "Path to OpenAPI/Swagger spec file",
+            "metavar": "PATH",
+        },
+    },
+    "spec_url": {
+        "flags": ("--spec-url",),
+        "kwargs": {
+            "type": str,
+            "help": "URL to OpenAPI/Swagger spec",
+            "metavar": "URL",
+        },
+    },
+    "from_json": {
+        "flags": ("--from-json",),
+        "kwargs": {
+            "type": str,
+            "help": "Build skill from extracted JSON",
+            "metavar": "FILE",
+        },
+    },
+}
+
+
+def add_openapi_arguments(parser: argparse.ArgumentParser) -> None:
+    """Add all openapi command arguments to a parser.
+
+    Registers shared args (name, description, output, enhance-level, api-key,
+    dry-run, verbose, quiet, workflow args) via add_all_standard_arguments(),
+    then adds OpenAPI-specific args on top.
+
+    The default for --enhance-level is overridden to 0 (disabled) for OpenAPI.
+    """
+    # Shared universal args first
+    add_all_standard_arguments(parser)
+
+    # Override enhance-level default to 0 for OpenAPI
+    for action in parser._actions:
+        if hasattr(action, "dest") and action.dest == "enhance_level":
+            action.default = 0
+            action.help = (
+                "AI enhancement level (auto-detects API vs LOCAL mode): "
+                "0=disabled (default for OpenAPI), 1=SKILL.md only, "
+                "2=+architecture/config, 3=full enhancement. "
+                "Mode selection: uses API if ANTHROPIC_API_KEY is set, "
+                "otherwise LOCAL (Claude Code)"
+            )
+
+    # OpenAPI-specific args
+    for arg_name, arg_def in OPENAPI_ARGUMENTS.items():
+        flags = arg_def["flags"]
+        kwargs = arg_def["kwargs"]
+        parser.add_argument(*flags, **kwargs)
--- a/src/skill_seekers/cli/arguments/pptx.py
+++ b/src/skill_seekers/cli/arguments/pptx.py
@@ -0,0 +1,68 @@
+"""PPTX command argument definitions.
+
+This module defines ALL arguments for the pptx command in ONE place.
+Both pptx_scraper.py (standalone) and parsers/pptx_parser.py (unified CLI)
+import and use these definitions.
+
+Shared arguments (name, description, output, enhance-level, api-key,
+dry-run, verbose, quiet, workflow args) come from common.py / workflow.py
+via ``add_all_standard_arguments()``.
+"""
+
+import argparse
+from typing import Any
+
+from .common import add_all_standard_arguments
+
+# PPTX-specific argument definitions as data structure
+# NOTE: Shared args (name, description, output, enhance_level, api_key, dry_run,
+#       verbose, quiet, workflow args) are registered by add_all_standard_arguments().
+PPTX_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "pptx": {
+        "flags": ("--pptx",),
+        "kwargs": {
+            "type": str,
+            "help": "Path to PowerPoint file (.pptx)",
+            "metavar": "PATH",
+        },
+    },
+    "from_json": {
+        "flags": ("--from-json",),
+        "kwargs": {
+            "type": str,
+            "help": "Build skill from extracted JSON",
+            "metavar": "FILE",
+        },
+    },
+}
+
+
+def add_pptx_arguments(parser: argparse.ArgumentParser) -> None:
+    """Add all pptx command arguments to a parser.
+
+    Registers shared args (name, description, output, enhance-level, api-key,
+    dry-run, verbose, quiet, workflow args) via add_all_standard_arguments(),
+    then adds PPTX-specific args on top.
+
+    The default for --enhance-level is overridden to 0 (disabled) for PPTX.
+    """
+    # Shared universal args first
+    add_all_standard_arguments(parser)
+
+    # Override enhance-level default to 0 for PPTX
+    for action in parser._actions:
+        if hasattr(action, "dest") and action.dest == "enhance_level":
+            action.default = 0
+            action.help = (
+                "AI enhancement level (auto-detects API vs LOCAL mode): "
+                "0=disabled (default for PPTX), 1=SKILL.md only, "
+                "2=+architecture/config, 3=full enhancement. "
+                "Mode selection: uses API if ANTHROPIC_API_KEY is set, "
+                "otherwise LOCAL (Claude Code)"
+            )
+
+    # PPTX-specific args
+    for arg_name, arg_def in PPTX_ARGUMENTS.items():
+        flags = arg_def["flags"]
+        kwargs = arg_def["kwargs"]
+        parser.add_argument(*flags, **kwargs)
--- a/src/skill_seekers/cli/arguments/rss.py
+++ b/src/skill_seekers/cli/arguments/rss.py
@@ -0,0 +1,101 @@
+"""RSS command argument definitions.
+
+This module defines ALL arguments for the rss command in ONE place.
+Both rss_scraper.py (standalone) and parsers/rss_parser.py (unified CLI)
+import and use these definitions.
+
+Shared arguments (name, description, output, enhance-level, api-key,
+dry-run, verbose, quiet, workflow args) come from common.py / workflow.py
+via ``add_all_standard_arguments()``.
+"""
+
+import argparse
+from typing import Any
+
+from .common import add_all_standard_arguments
+
+# RSS-specific argument definitions as data structure
+# NOTE: Shared args (name, description, output, enhance_level, api_key, dry_run,
+#       verbose, quiet, workflow args) are registered by add_all_standard_arguments().
+RSS_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "feed_url": {
+        "flags": ("--feed-url",),
+        "kwargs": {
+            "type": str,
+            "help": "URL of the RSS/Atom feed",
+            "metavar": "URL",
+        },
+    },
+    "feed_path": {
+        "flags": ("--feed-path",),
+        "kwargs": {
+            "type": str,
+            "help": "Path to local RSS/Atom feed file",
+            "metavar": "PATH",
+        },
+    },
+    "follow_links": {
+        "flags": ("--follow-links",),
+        "kwargs": {
+            "action": "store_true",
+            "default": True,
+            "help": "Follow article links and extract full content (default: True)",
+        },
+    },
+    "no_follow_links": {
+        "flags": ("--no-follow-links",),
+        "kwargs": {
+            "action": "store_false",
+            "dest": "follow_links",
+            "help": "Do not follow article links; use feed summary only",
+        },
+    },
+    "max_articles": {
+        "flags": ("--max-articles",),
+        "kwargs": {
+            "type": int,
+            "default": 50,
+            "help": "Maximum number of articles to extract (default: 50)",
+            "metavar": "N",
+        },
+    },
+    "from_json": {
+        "flags": ("--from-json",),
+        "kwargs": {
+            "type": str,
+            "help": "Build skill from extracted JSON",
+            "metavar": "FILE",
+        },
+    },
+}
+
+
+def add_rss_arguments(parser: argparse.ArgumentParser) -> None:
+    """Add all rss command arguments to a parser.
+
+    Registers shared args (name, description, output, enhance-level, api-key,
+    dry-run, verbose, quiet, workflow args) via add_all_standard_arguments(),
+    then adds RSS-specific args on top.
+
+    The default for --enhance-level is overridden to 0 (disabled) for RSS.
+    """
+    # Shared universal args first
+    add_all_standard_arguments(parser)
+
+    # Override enhance-level default to 0 for RSS
+    for action in parser._actions:
+        if hasattr(action, "dest") and action.dest == "enhance_level":
+            action.default = 0
+            action.help = (
+                "AI enhancement level (auto-detects API vs LOCAL mode): "
+                "0=disabled (default for RSS), 1=SKILL.md only, "
+                "2=+architecture/config, 3=full enhancement. "
+                "Mode selection: uses API if ANTHROPIC_API_KEY is set, "
+                "otherwise LOCAL (Claude Code)"
+            )
+
+    # RSS-specific args
+    for arg_name, arg_def in RSS_ARGUMENTS.items():
+        flags = arg_def["flags"]
+        kwargs = arg_def["kwargs"]
+        parser.add_argument(*flags, **kwargs)
--- a/src/skill_seekers/cli/asciidoc_scraper.py
+++ b/src/skill_seekers/cli/asciidoc_scraper.py
--- a/src/skill_seekers/cli/chat_scraper.py
+++ b/src/skill_seekers/cli/chat_scraper.py
--- a/src/skill_seekers/cli/config_validator.py
+++ b/src/skill_seekers/cli/config_validator.py
@@ -7,6 +7,19 @@ Validates unified config format that supports multiple sources:
 - github (repository scraping)
 - pdf (PDF document scraping)
 - local (local codebase analysis)
+- word (Word .docx document scraping)
+- video (video transcript/visual extraction)
+- epub (EPUB e-book extraction)
+- jupyter (Jupyter Notebook extraction)
+- html (local HTML file extraction)
+- openapi (OpenAPI/Swagger spec extraction)
+- asciidoc (AsciiDoc document extraction)
+- pptx (PowerPoint presentation extraction)
+- confluence (Confluence wiki extraction)
+- notion (Notion page extraction)
+- rss (RSS/Atom feed extraction)
+- manpage (man page extraction)
+- chat (Slack/Discord chat export extraction)

 Legacy config format support removed in v2.11.0.
 All configs must use unified format with 'sources' array.
@@ -27,7 +40,25 @@ class ConfigValidator:
    """

    # Valid source types
-    VALID_SOURCE_TYPES = {"documentation", "github", "pdf", "local", "word", "video"}
+    VALID_SOURCE_TYPES = {
+        "documentation",
+        "github",
+        "pdf",
+        "local",
+        "word",
+        "video",
+        "epub",
+        "jupyter",
+        "html",
+        "openapi",
+        "asciidoc",
+        "pptx",
+        "confluence",
+        "notion",
+        "rss",
+        "manpage",
+        "chat",
+    }

    # Valid merge modes
    VALID_MERGE_MODES = {"rule-based", "claude-enhanced"}
@@ -159,6 +190,32 @@ class ConfigValidator:
            self._validate_pdf_source(source, index)
        elif source_type == "local":
            self._validate_local_source(source, index)
+        elif source_type == "word":
+            self._validate_word_source(source, index)
+        elif source_type == "video":
+            self._validate_video_source(source, index)
+        elif source_type == "epub":
+            self._validate_epub_source(source, index)
+        elif source_type == "jupyter":
+            self._validate_jupyter_source(source, index)
+        elif source_type == "html":
+            self._validate_html_source(source, index)
+        elif source_type == "openapi":
+            self._validate_openapi_source(source, index)
+        elif source_type == "asciidoc":
+            self._validate_asciidoc_source(source, index)
+        elif source_type == "pptx":
+            self._validate_pptx_source(source, index)
+        elif source_type == "confluence":
+            self._validate_confluence_source(source, index)
+        elif source_type == "notion":
+            self._validate_notion_source(source, index)
+        elif source_type == "rss":
+            self._validate_rss_source(source, index)
+        elif source_type == "manpage":
+            self._validate_manpage_source(source, index)
+        elif source_type == "chat":
+            self._validate_chat_source(source, index)

    def _validate_documentation_source(self, source: dict[str, Any], index: int):
        """Validate documentation source configuration."""
@@ -253,12 +310,126 @@ class ConfigValidator:
                    f"Source {index} (local): Invalid ai_mode '{ai_mode}'. Must be one of {self.VALID_AI_MODES}"
                )

+    def _validate_word_source(self, source: dict[str, Any], index: int):
+        """Validate Word document (.docx) source configuration."""
+        if "path" not in source:
+            raise ValueError(f"Source {index} (word): Missing required field 'path'")
+        word_path = source["path"]
+        if not Path(word_path).exists():
+            logger.warning(f"Source {index} (word): File not found: {word_path}")
+
+    def _validate_video_source(self, source: dict[str, Any], index: int):
+        """Validate video source configuration."""
+        has_url = "url" in source
+        has_path = "path" in source
+        has_playlist = "playlist" in source
+        if not has_url and not has_path and not has_playlist:
+            raise ValueError(
+                f"Source {index} (video): Missing required field 'url', 'path', or 'playlist'"
+            )
+
+    def _validate_epub_source(self, source: dict[str, Any], index: int):
+        """Validate EPUB source configuration."""
+        if "path" not in source:
+            raise ValueError(f"Source {index} (epub): Missing required field 'path'")
+        epub_path = source["path"]
+        if not Path(epub_path).exists():
+            logger.warning(f"Source {index} (epub): File not found: {epub_path}")
+
+    def _validate_jupyter_source(self, source: dict[str, Any], index: int):
+        """Validate Jupyter Notebook source configuration."""
+        if "path" not in source:
+            raise ValueError(f"Source {index} (jupyter): Missing required field 'path'")
+        nb_path = source["path"]
+        if not Path(nb_path).exists():
+            logger.warning(f"Source {index} (jupyter): Path not found: {nb_path}")
+
+    def _validate_html_source(self, source: dict[str, Any], index: int):
+        """Validate local HTML source configuration."""
+        if "path" not in source:
+            raise ValueError(f"Source {index} (html): Missing required field 'path'")
+        html_path = source["path"]
+        if not Path(html_path).exists():
+            logger.warning(f"Source {index} (html): Path not found: {html_path}")
+
+    def _validate_openapi_source(self, source: dict[str, Any], index: int):
+        """Validate OpenAPI/Swagger source configuration."""
+        if "path" not in source and "url" not in source:
+            raise ValueError(f"Source {index} (openapi): Missing required field 'path' or 'url'")
+        if "path" in source and not Path(source["path"]).exists():
+            logger.warning(f"Source {index} (openapi): File not found: {source['path']}")
+
+    def _validate_asciidoc_source(self, source: dict[str, Any], index: int):
+        """Validate AsciiDoc source configuration."""
+        if "path" not in source:
+            raise ValueError(f"Source {index} (asciidoc): Missing required field 'path'")
+        adoc_path = source["path"]
+        if not Path(adoc_path).exists():
+            logger.warning(f"Source {index} (asciidoc): Path not found: {adoc_path}")
+
+    def _validate_pptx_source(self, source: dict[str, Any], index: int):
+        """Validate PowerPoint source configuration."""
+        if "path" not in source:
+            raise ValueError(f"Source {index} (pptx): Missing required field 'path'")
+        pptx_path = source["path"]
+        if not Path(pptx_path).exists():
+            logger.warning(f"Source {index} (pptx): File not found: {pptx_path}")
+
+    def _validate_confluence_source(self, source: dict[str, Any], index: int):
+        """Validate Confluence source configuration."""
+        has_url = "url" in source or "base_url" in source
+        has_path = "path" in source
+        if not has_url and not has_path:
+            raise ValueError(
+                f"Source {index} (confluence): Missing required field 'url'/'base_url' "
+                f"(for API) or 'path' (for export)"
+            )
+        if has_url and "space_key" not in source and "path" not in source:
+            logger.warning(f"Source {index} (confluence): No 'space_key' specified for API mode")
+
+    def _validate_notion_source(self, source: dict[str, Any], index: int):
+        """Validate Notion source configuration."""
+        has_url = "url" in source or "database_id" in source or "page_id" in source
+        has_path = "path" in source
+        if not has_url and not has_path:
+            raise ValueError(
+                f"Source {index} (notion): Missing required field 'url'/'database_id'/'page_id' "
+                f"(for API) or 'path' (for export)"
+            )
+
+    def _validate_rss_source(self, source: dict[str, Any], index: int):
+        """Validate RSS/Atom feed source configuration."""
+        if "url" not in source and "path" not in source:
+            raise ValueError(f"Source {index} (rss): Missing required field 'url' or 'path'")
+
+    def _validate_manpage_source(self, source: dict[str, Any], index: int):
+        """Validate man page source configuration."""
+        if "path" not in source and "names" not in source:
+            raise ValueError(f"Source {index} (manpage): Missing required field 'path' or 'names'")
+        if "path" in source and not Path(source["path"]).exists():
+            logger.warning(f"Source {index} (manpage): Path not found: {source['path']}")
+
+    def _validate_chat_source(self, source: dict[str, Any], index: int):
+        """Validate Slack/Discord chat source configuration."""
+        has_path = "path" in source
+        has_api = "token" in source or "webhook_url" in source
+        has_channel = "channel" in source or "channel_id" in source
+        if not has_path and not has_api:
+            raise ValueError(
+                f"Source {index} (chat): Missing required field 'path' (for export) "
+                f"or 'token' (for API)"
+            )
+        if has_api and not has_channel:
+            logger.warning(
+                f"Source {index} (chat): No 'channel' or 'channel_id' specified for API mode"
+            )
+
    def get_sources_by_type(self, source_type: str) -> list[dict[str, Any]]:
        """
        Get all sources of a specific type.

        Args:
-            source_type: 'documentation', 'github', 'pdf', or 'local'
+            source_type: Any valid source type string

        Returns:
            List of sources matching the type
--- a/src/skill_seekers/cli/confluence_scraper.py
+++ b/src/skill_seekers/cli/confluence_scraper.py
--- a/src/skill_seekers/cli/create_command.py
+++ b/src/skill_seekers/cli/create_command.py
@@ -140,6 +140,26 @@ class CreateCommand:
            return self._route_video()
        elif self.source_info.type == "config":
            return self._route_config()
+        elif self.source_info.type == "jupyter":
+            return self._route_generic("jupyter_scraper", "--notebook")
+        elif self.source_info.type == "html":
+            return self._route_generic("html_scraper", "--html-path")
+        elif self.source_info.type == "openapi":
+            return self._route_generic("openapi_scraper", "--spec")
+        elif self.source_info.type == "asciidoc":
+            return self._route_generic("asciidoc_scraper", "--asciidoc-path")
+        elif self.source_info.type == "pptx":
+            return self._route_generic("pptx_scraper", "--pptx")
+        elif self.source_info.type == "rss":
+            return self._route_generic("rss_scraper", "--feed-path")
+        elif self.source_info.type == "manpage":
+            return self._route_generic("man_scraper", "--man-path")
+        elif self.source_info.type == "confluence":
+            return self._route_generic("confluence_scraper", "--export-path")
+        elif self.source_info.type == "notion":
+            return self._route_generic("notion_scraper", "--export-path")
+        elif self.source_info.type == "chat":
+            return self._route_generic("chat_scraper", "--export-path")
        else:
            logger.error(f"Unknown source type: {self.source_info.type}")
            return 1
@@ -485,6 +505,40 @@ class CreateCommand:
        finally:
            sys.argv = original_argv

+    def _route_generic(self, module_name: str, file_flag: str) -> int:
+        """Generic routing for new source types.
+
+        Most new source types (jupyter, html, openapi, asciidoc, pptx, rss,
+        manpage, confluence, notion, chat) follow the same pattern:
+        import module, build argv with --flag <file_path>, add common args, call main().
+
+        Args:
+            module_name: Python module name under skill_seekers.cli (e.g., "jupyter_scraper")
+            file_flag: CLI flag for the source file (e.g., "--notebook")
+
+        Returns:
+            Exit code from scraper
+        """
+        import importlib
+
+        module = importlib.import_module(f"skill_seekers.cli.{module_name}")
+
+        argv = [module_name]
+
+        file_path = self.source_info.parsed.get("file_path", "")
+        if file_path:
+            argv.extend([file_flag, file_path])
+
+        self._add_common_args(argv)
+
+        logger.debug(f"Calling {module_name} with argv: {argv}")
+        original_argv = sys.argv
+        try:
+            sys.argv = argv
+            return module.main()
+        finally:
+            sys.argv = original_argv
+
    def _add_common_args(self, argv: list[str]) -> None:
        """Add truly universal arguments to argv list.

--- a/src/skill_seekers/cli/html_scraper.py
+++ b/src/skill_seekers/cli/html_scraper.py
--- a/src/skill_seekers/cli/jupyter_scraper.py
+++ b/src/skill_seekers/cli/jupyter_scraper.py
--- a/src/skill_seekers/cli/main.py
+++ b/src/skill_seekers/cli/main.py
@@ -15,7 +15,17 @@ Commands:
    word                 Extract from Word (.docx) file
    epub                 Extract from EPUB e-book (.epub)
    video                Extract from video (YouTube or local)
-    unified              Multi-source scraping (docs + GitHub + PDF)
+    jupyter              Extract from Jupyter Notebook (.ipynb)
+    html                 Extract from local HTML files
+    openapi              Extract from OpenAPI/Swagger spec
+    asciidoc             Extract from AsciiDoc documents (.adoc)
+    pptx                 Extract from PowerPoint (.pptx)
+    rss                  Extract from RSS/Atom feeds
+    manpage              Extract from man pages
+    confluence           Extract from Confluence wiki
+    notion               Extract from Notion pages
+    chat                 Extract from Slack/Discord chat exports
+    unified              Multi-source scraping (docs + GitHub + PDF + more)
    analyze              Analyze local codebase and extract code knowledge
    enhance              AI-powered enhancement (auto: API or LOCAL mode)
    enhance-status       Check enhancement status (for background/daemon modes)
@@ -70,6 +80,17 @@ COMMAND_MODULES = {
    "quality": "skill_seekers.cli.quality_metrics",
    "workflows": "skill_seekers.cli.workflows_command",
    "sync-config": "skill_seekers.cli.sync_config",
+    # New source types (v3.2.0+)
+    "jupyter": "skill_seekers.cli.jupyter_scraper",
+    "html": "skill_seekers.cli.html_scraper",
+    "openapi": "skill_seekers.cli.openapi_scraper",
+    "asciidoc": "skill_seekers.cli.asciidoc_scraper",
+    "pptx": "skill_seekers.cli.pptx_scraper",
+    "rss": "skill_seekers.cli.rss_scraper",
+    "manpage": "skill_seekers.cli.man_scraper",
+    "confluence": "skill_seekers.cli.confluence_scraper",
+    "notion": "skill_seekers.cli.notion_scraper",
+    "chat": "skill_seekers.cli.chat_scraper",
 }


--- a/src/skill_seekers/cli/man_scraper.py
+++ b/src/skill_seekers/cli/man_scraper.py
--- a/src/skill_seekers/cli/notion_scraper.py
+++ b/src/skill_seekers/cli/notion_scraper.py
--- a/src/skill_seekers/cli/openapi_scraper.py
+++ b/src/skill_seekers/cli/openapi_scraper.py
--- a/src/skill_seekers/cli/parsers/init.py
+++ b/src/skill_seekers/cli/parsers/init.py
@@ -33,6 +33,18 @@ from .quality_parser import QualityParser
 from .workflows_parser import WorkflowsParser
 from .sync_config_parser import SyncConfigParser

+# New source type parsers (v3.2.0+)
+from .jupyter_parser import JupyterParser
+from .html_parser import HtmlParser
+from .openapi_parser import OpenAPIParser
+from .asciidoc_parser import AsciiDocParser
+from .pptx_parser import PptxParser
+from .rss_parser import RssParser
+from .manpage_parser import ManPageParser
+from .confluence_parser import ConfluenceParser
+from .notion_parser import NotionParser
+from .chat_parser import ChatParser
+
 # Registry of all parsers (in order of usage frequency)
 PARSERS = [
    CreateParser(),  # NEW: Unified create command (placed first for prominence)
@@ -60,6 +72,17 @@ PARSERS = [
    QualityParser(),
    WorkflowsParser(),
    SyncConfigParser(),
+    # New source types (v3.2.0+)
+    JupyterParser(),
+    HtmlParser(),
+    OpenAPIParser(),
+    AsciiDocParser(),
+    PptxParser(),
+    RssParser(),
+    ManPageParser(),
+    ConfluenceParser(),
+    NotionParser(),
+    ChatParser(),
 ]


--- a/src/skill_seekers/cli/parsers/asciidoc_parser.py
+++ b/src/skill_seekers/cli/parsers/asciidoc_parser.py
@@ -0,0 +1,32 @@
+"""AsciiDoc subcommand parser.
+
+Uses shared argument definitions from arguments.asciidoc to ensure
+consistency with the standalone asciidoc_scraper module.
+"""
+
+from .base import SubcommandParser
+from skill_seekers.cli.arguments.asciidoc import add_asciidoc_arguments
+
+
+class AsciiDocParser(SubcommandParser):
+    """Parser for asciidoc subcommand."""
+
+    @property
+    def name(self) -> str:
+        return "asciidoc"
+
+    @property
+    def help(self) -> str:
+        return "Extract from AsciiDoc documents (.adoc)"
+
+    @property
+    def description(self) -> str:
+        return "Extract content from AsciiDoc documents (.adoc) and generate skill"
+
+    def add_arguments(self, parser):
+        """Add asciidoc-specific arguments.
+
+        Uses shared argument definitions to ensure consistency
+        with asciidoc_scraper.py (standalone scraper).
+        """
+        add_asciidoc_arguments(parser)
--- a/src/skill_seekers/cli/parsers/chat_parser.py
+++ b/src/skill_seekers/cli/parsers/chat_parser.py
@@ -0,0 +1,32 @@
+"""Chat subcommand parser.
+
+Uses shared argument definitions from arguments.chat to ensure
+consistency with the standalone chat_scraper module.
+"""
+
+from .base import SubcommandParser
+from skill_seekers.cli.arguments.chat import add_chat_arguments
+
+
+class ChatParser(SubcommandParser):
+    """Parser for chat subcommand."""
+
+    @property
+    def name(self) -> str:
+        return "chat"
+
+    @property
+    def help(self) -> str:
+        return "Extract from Slack/Discord chat exports"
+
+    @property
+    def description(self) -> str:
+        return "Extract content from Slack/Discord chat exports and generate skill"
+
+    def add_arguments(self, parser):
+        """Add chat-specific arguments.
+
+        Uses shared argument definitions to ensure consistency
+        with chat_scraper.py (standalone scraper).
+        """
+        add_chat_arguments(parser)
--- a/src/skill_seekers/cli/parsers/confluence_parser.py
+++ b/src/skill_seekers/cli/parsers/confluence_parser.py
@@ -0,0 +1,32 @@
+"""Confluence subcommand parser.
+
+Uses shared argument definitions from arguments.confluence to ensure
+consistency with the standalone confluence_scraper module.
+"""
+
+from .base import SubcommandParser
+from skill_seekers.cli.arguments.confluence import add_confluence_arguments
+
+
+class ConfluenceParser(SubcommandParser):
+    """Parser for confluence subcommand."""
+
+    @property
+    def name(self) -> str:
+        return "confluence"
+
+    @property
+    def help(self) -> str:
+        return "Extract from Confluence wiki"
+
+    @property
+    def description(self) -> str:
+        return "Extract content from Confluence wiki and generate skill"
+
+    def add_arguments(self, parser):
+        """Add confluence-specific arguments.
+
+        Uses shared argument definitions to ensure consistency
+        with confluence_scraper.py (standalone scraper).
+        """
+        add_confluence_arguments(parser)
--- a/src/skill_seekers/cli/parsers/html_parser.py
+++ b/src/skill_seekers/cli/parsers/html_parser.py
@@ -0,0 +1,32 @@
+"""HTML subcommand parser.
+
+Uses shared argument definitions from arguments.html to ensure
+consistency with the standalone html_scraper module.
+"""
+
+from .base import SubcommandParser
+from skill_seekers.cli.arguments.html import add_html_arguments
+
+
+class HtmlParser(SubcommandParser):
+    """Parser for html subcommand."""
+
+    @property
+    def name(self) -> str:
+        return "html"
+
+    @property
+    def help(self) -> str:
+        return "Extract from local HTML files (.html/.htm)"
+
+    @property
+    def description(self) -> str:
+        return "Extract content from local HTML files (.html/.htm) and generate skill"
+
+    def add_arguments(self, parser):
+        """Add html-specific arguments.
+
+        Uses shared argument definitions to ensure consistency
+        with html_scraper.py (standalone scraper).
+        """
+        add_html_arguments(parser)
--- a/src/skill_seekers/cli/parsers/jupyter_parser.py
+++ b/src/skill_seekers/cli/parsers/jupyter_parser.py
@@ -0,0 +1,32 @@
+"""Jupyter Notebook subcommand parser.
+
+Uses shared argument definitions from arguments.jupyter to ensure
+consistency with the standalone jupyter_scraper module.
+"""
+
+from .base import SubcommandParser
+from skill_seekers.cli.arguments.jupyter import add_jupyter_arguments
+
+
+class JupyterParser(SubcommandParser):
+    """Parser for jupyter subcommand."""
+
+    @property
+    def name(self) -> str:
+        return "jupyter"
+
+    @property
+    def help(self) -> str:
+        return "Extract from Jupyter Notebook (.ipynb)"
+
+    @property
+    def description(self) -> str:
+        return "Extract content from Jupyter Notebook (.ipynb) and generate skill"
+
+    def add_arguments(self, parser):
+        """Add jupyter-specific arguments.
+
+        Uses shared argument definitions to ensure consistency
+        with jupyter_scraper.py (standalone scraper).
+        """
+        add_jupyter_arguments(parser)
--- a/src/skill_seekers/cli/parsers/manpage_parser.py
+++ b/src/skill_seekers/cli/parsers/manpage_parser.py
@@ -0,0 +1,32 @@
+"""Man page subcommand parser.
+
+Uses shared argument definitions from arguments.manpage to ensure
+consistency with the standalone man_scraper module.
+"""
+
+from .base import SubcommandParser
+from skill_seekers.cli.arguments.manpage import add_manpage_arguments
+
+
+class ManPageParser(SubcommandParser):
+    """Parser for manpage subcommand."""
+
+    @property
+    def name(self) -> str:
+        return "manpage"
+
+    @property
+    def help(self) -> str:
+        return "Extract from man pages"
+
+    @property
+    def description(self) -> str:
+        return "Extract content from man pages and generate skill"
+
+    def add_arguments(self, parser):
+        """Add manpage-specific arguments.
+
+        Uses shared argument definitions to ensure consistency
+        with man_scraper.py (standalone scraper).
+        """
+        add_manpage_arguments(parser)
--- a/src/skill_seekers/cli/parsers/notion_parser.py
+++ b/src/skill_seekers/cli/parsers/notion_parser.py
@@ -0,0 +1,32 @@
+"""Notion subcommand parser.
+
+Uses shared argument definitions from arguments.notion to ensure
+consistency with the standalone notion_scraper module.
+"""
+
+from .base import SubcommandParser
+from skill_seekers.cli.arguments.notion import add_notion_arguments
+
+
+class NotionParser(SubcommandParser):
+    """Parser for notion subcommand."""
+
+    @property
+    def name(self) -> str:
+        return "notion"
+
+    @property
+    def help(self) -> str:
+        return "Extract from Notion pages"
+
+    @property
+    def description(self) -> str:
+        return "Extract content from Notion pages and generate skill"
+
+    def add_arguments(self, parser):
+        """Add notion-specific arguments.
+
+        Uses shared argument definitions to ensure consistency
+        with notion_scraper.py (standalone scraper).
+        """
+        add_notion_arguments(parser)
--- a/src/skill_seekers/cli/parsers/openapi_parser.py
+++ b/src/skill_seekers/cli/parsers/openapi_parser.py
@@ -0,0 +1,32 @@
+"""OpenAPI subcommand parser.
+
+Uses shared argument definitions from arguments.openapi to ensure
+consistency with the standalone openapi_scraper module.
+"""
+
+from .base import SubcommandParser
+from skill_seekers.cli.arguments.openapi import add_openapi_arguments
+
+
+class OpenAPIParser(SubcommandParser):
+    """Parser for openapi subcommand."""
+
+    @property
+    def name(self) -> str:
+        return "openapi"
+
+    @property
+    def help(self) -> str:
+        return "Extract from OpenAPI/Swagger spec"
+
+    @property
+    def description(self) -> str:
+        return "Extract content from OpenAPI/Swagger spec and generate skill"
+
+    def add_arguments(self, parser):
+        """Add openapi-specific arguments.
+
+        Uses shared argument definitions to ensure consistency
+        with openapi_scraper.py (standalone scraper).
+        """
+        add_openapi_arguments(parser)
--- a/src/skill_seekers/cli/parsers/pptx_parser.py
+++ b/src/skill_seekers/cli/parsers/pptx_parser.py
@@ -0,0 +1,32 @@
+"""PPTX subcommand parser.
+
+Uses shared argument definitions from arguments.pptx to ensure
+consistency with the standalone pptx_scraper module.
+"""
+
+from .base import SubcommandParser
+from skill_seekers.cli.arguments.pptx import add_pptx_arguments
+
+
+class PptxParser(SubcommandParser):
+    """Parser for pptx subcommand."""
+
+    @property
+    def name(self) -> str:
+        return "pptx"
+
+    @property
+    def help(self) -> str:
+        return "Extract from PowerPoint presentations (.pptx)"
+
+    @property
+    def description(self) -> str:
+        return "Extract content from PowerPoint presentations (.pptx) and generate skill"
+
+    def add_arguments(self, parser):
+        """Add pptx-specific arguments.
+
+        Uses shared argument definitions to ensure consistency
+        with pptx_scraper.py (standalone scraper).
+        """
+        add_pptx_arguments(parser)
--- a/src/skill_seekers/cli/parsers/rss_parser.py
+++ b/src/skill_seekers/cli/parsers/rss_parser.py
@@ -0,0 +1,32 @@
+"""RSS subcommand parser.
+
+Uses shared argument definitions from arguments.rss to ensure
+consistency with the standalone rss_scraper module.
+"""
+
+from .base import SubcommandParser
+from skill_seekers.cli.arguments.rss import add_rss_arguments
+
+
+class RssParser(SubcommandParser):
+    """Parser for rss subcommand."""
+
+    @property
+    def name(self) -> str:
+        return "rss"
+
+    @property
+    def help(self) -> str:
+        return "Extract from RSS/Atom feeds"
+
+    @property
+    def description(self) -> str:
+        return "Extract content from RSS/Atom feeds and generate skill"
+
+    def add_arguments(self, parser):
+        """Add rss-specific arguments.
+
+        Uses shared argument definitions to ensure consistency
+        with rss_scraper.py (standalone scraper).
+        """
+        add_rss_arguments(parser)
--- a/src/skill_seekers/cli/pptx_scraper.py
+++ b/src/skill_seekers/cli/pptx_scraper.py
--- a/src/skill_seekers/cli/rss_scraper.py
+++ b/src/skill_seekers/cli/rss_scraper.py
--- a/src/skill_seekers/cli/source_detector.py
+++ b/src/skill_seekers/cli/source_detector.py
@@ -1,7 +1,12 @@
 """Source type detection for unified create command.

-Auto-detects whether a source is a web URL, GitHub repository,
-local directory, PDF file, or config file based on patterns.
+Auto-detects source type from user input — supports web URLs, GitHub repos,
+local directories, and 14+ file types (PDF, DOCX, EPUB, IPYNB, HTML, YAML/OpenAPI,
+AsciiDoc, PPTX, RSS/Atom, man pages, video files, and config JSON).
+
+Note: Confluence, Notion, and Slack/Discord chat sources are API/export-based
+and cannot be auto-detected from a single argument. Use their dedicated
+subcommands (``skill-seekers confluence``, ``notion``, ``chat``) instead.
 """

 import os
@@ -66,11 +71,49 @@ class SourceDetector:
        if source.endswith(".epub"):
            return cls._detect_epub(source)

+        if source.endswith(".ipynb"):
+            return cls._detect_jupyter(source)
+
+        if source.lower().endswith((".html", ".htm")):
+            return cls._detect_html(source)
+
+        if source.endswith(".pptx"):
+            return cls._detect_pptx(source)
+
+        if source.lower().endswith((".adoc", ".asciidoc")):
+            return cls._detect_asciidoc(source)
+
+        # Man page file extensions (.1 through .8, .man)
+        # Only match if the basename looks like a man page (e.g., "git.1", not "log.1")
+        # Require basename without the extension to be a plausible command name
+        if source.lower().endswith(".man"):
+            return cls._detect_manpage(source)
+        MAN_SECTION_EXTENSIONS = (".1", ".2", ".3", ".4", ".5", ".6", ".7", ".8")
+        if source.lower().endswith(MAN_SECTION_EXTENSIONS):
+            # Heuristic: man pages have a simple basename (no dots before extension)
+            # e.g., "git.1" is a man page, "access.log.1" is not
+            basename_no_ext = os.path.splitext(os.path.basename(source))[0]
+            if "." not in basename_no_ext:
+                return cls._detect_manpage(source)
+
        # Video file extensions
        VIDEO_EXTENSIONS = (".mp4", ".mkv", ".avi", ".mov", ".webm", ".flv", ".wmv")
        if source.lower().endswith(VIDEO_EXTENSIONS):
            return cls._detect_video_file(source)

+        # RSS/Atom feed file extensions (only .rss and .atom — .xml is too generic)
+        if source.lower().endswith((".rss", ".atom")):
+            return cls._detect_rss(source)
+
+        # OpenAPI/Swagger spec detection (YAML files with OpenAPI content)
+        # Sniff file content for 'openapi:' or 'swagger:' keys before committing
+        if (
+            source.lower().endswith((".yaml", ".yml"))
+            and os.path.isfile(source)
+            and cls._looks_like_openapi(source)
+        ):
+            return cls._detect_openapi(source)
+
        # 2. Video URL detection (before directory check)
        video_url_info = cls._detect_video_url(source)
        if video_url_info:
@@ -97,15 +140,22 @@ class SourceDetector:
        raise ValueError(
            f"Cannot determine source type for: {source}\n\n"
            "Examples:\n"
-            "  Web:    skill-seekers create https://docs.react.dev/\n"
-            "  GitHub: skill-seekers create facebook/react\n"
-            "  Local:  skill-seekers create ./my-project\n"
-            "  PDF:    skill-seekers create tutorial.pdf\n"
-            "  DOCX:   skill-seekers create document.docx\n"
-            "  EPUB:   skill-seekers create ebook.epub\n"
-            "  Video:  skill-seekers create https://youtube.com/watch?v=...\n"
-            "  Video:  skill-seekers create recording.mp4\n"
-            "  Config: skill-seekers create configs/react.json"
+            "  Web:        skill-seekers create https://docs.react.dev/\n"
+            "  GitHub:     skill-seekers create facebook/react\n"
+            "  Local:      skill-seekers create ./my-project\n"
+            "  PDF:        skill-seekers create tutorial.pdf\n"
+            "  DOCX:       skill-seekers create document.docx\n"
+            "  EPUB:       skill-seekers create ebook.epub\n"
+            "  Jupyter:    skill-seekers create notebook.ipynb\n"
+            "  HTML:       skill-seekers create page.html\n"
+            "  OpenAPI:    skill-seekers create openapi.yaml\n"
+            "  AsciiDoc:   skill-seekers create document.adoc\n"
+            "  PowerPoint: skill-seekers create presentation.pptx\n"
+            "  RSS:        skill-seekers create feed.rss\n"
+            "  Man page:   skill-seekers create command.1\n"
+            "  Video:      skill-seekers create https://youtube.com/watch?v=...\n"
+            "  Video:      skill-seekers create recording.mp4\n"
+            "  Config:     skill-seekers create configs/react.json"
        )

    @classmethod
@@ -140,6 +190,90 @@ class SourceDetector:
            type="epub", parsed={"file_path": source}, suggested_name=name, raw_input=source
        )

+    @classmethod
+    def _detect_jupyter(cls, source: str) -> SourceInfo:
+        """Detect Jupyter Notebook file source."""
+        name = os.path.splitext(os.path.basename(source))[0]
+        return SourceInfo(
+            type="jupyter", parsed={"file_path": source}, suggested_name=name, raw_input=source
+        )
+
+    @classmethod
+    def _detect_html(cls, source: str) -> SourceInfo:
+        """Detect local HTML file source."""
+        name = os.path.splitext(os.path.basename(source))[0]
+        return SourceInfo(
+            type="html", parsed={"file_path": source}, suggested_name=name, raw_input=source
+        )
+
+    @classmethod
+    def _detect_pptx(cls, source: str) -> SourceInfo:
+        """Detect PowerPoint file source."""
+        name = os.path.splitext(os.path.basename(source))[0]
+        return SourceInfo(
+            type="pptx", parsed={"file_path": source}, suggested_name=name, raw_input=source
+        )
+
+    @classmethod
+    def _detect_asciidoc(cls, source: str) -> SourceInfo:
+        """Detect AsciiDoc file source."""
+        name = os.path.splitext(os.path.basename(source))[0]
+        return SourceInfo(
+            type="asciidoc", parsed={"file_path": source}, suggested_name=name, raw_input=source
+        )
+
+    @classmethod
+    def _detect_manpage(cls, source: str) -> SourceInfo:
+        """Detect man page file source."""
+        name = os.path.splitext(os.path.basename(source))[0]
+        return SourceInfo(
+            type="manpage", parsed={"file_path": source}, suggested_name=name, raw_input=source
+        )
+
+    @classmethod
+    def _detect_rss(cls, source: str) -> SourceInfo:
+        """Detect RSS/Atom feed file source."""
+        name = os.path.splitext(os.path.basename(source))[0]
+        return SourceInfo(
+            type="rss", parsed={"file_path": source}, suggested_name=name, raw_input=source
+        )
+
+    @classmethod
+    def _looks_like_openapi(cls, source: str) -> bool:
+        """Check if a YAML/JSON file looks like an OpenAPI or Swagger spec.
+
+        Reads the first few lines to look for 'openapi:' or 'swagger:' keys.
+
+        Args:
+            source: Path to the file
+
+        Returns:
+            True if the file appears to be an OpenAPI/Swagger spec
+        """
+        try:
+            with open(source, encoding="utf-8", errors="replace") as f:
+                # Read first 20 lines — the openapi/swagger key is always near the top
+                for _ in range(20):
+                    line = f.readline()
+                    if not line:
+                        break
+                    stripped = line.strip().lower()
+                    if stripped.startswith("openapi:") or stripped.startswith("swagger:"):
+                        return True
+                    if stripped.startswith('"openapi"') or stripped.startswith('"swagger"'):
+                        return True
+        except OSError:
+            pass
+        return False
+
+    @classmethod
+    def _detect_openapi(cls, source: str) -> SourceInfo:
+        """Detect OpenAPI/Swagger spec file source."""
+        name = os.path.splitext(os.path.basename(source))[0]
+        return SourceInfo(
+            type="openapi", parsed={"file_path": source}, suggested_name=name, raw_input=source
+        )
+
    @classmethod
    def _detect_video_file(cls, source: str) -> SourceInfo:
        """Detect local video file source."""
@@ -312,5 +446,19 @@ class SourceDetector:
            if not os.path.isfile(config_path):
                raise ValueError(f"Path is not a file: {config_path}")

-        # For web and github, validation happens during scraping
-        # (URL accessibility, repo existence)
+        elif source_info.type in ("jupyter", "html", "pptx", "asciidoc", "manpage", "openapi"):
+            file_path = source_info.parsed.get("file_path", "")
+            if file_path:
+                type_label = source_info.type.upper()
+                if not os.path.exists(file_path):
+                    raise ValueError(f"{type_label} file does not exist: {file_path}")
+                if not os.path.isfile(file_path) and not os.path.isdir(file_path):
+                    raise ValueError(f"Path is not a file or directory: {file_path}")
+
+        elif source_info.type == "rss":
+            file_path = source_info.parsed.get("file_path", "")
+            if file_path and not os.path.exists(file_path):
+                raise ValueError(f"RSS/Atom file does not exist: {file_path}")
+
+        # For web, github, confluence, notion, chat, rss (URL), validation happens
+        # during scraping (URL accessibility, API auth, etc.)
--- a/src/skill_seekers/cli/unified_scraper.py
+++ b/src/skill_seekers/cli/unified_scraper.py
@@ -76,6 +76,17 @@ class UnifiedScraper:
            "word": [],  # List of word sources
            "video": [],  # List of video sources
            "local": [],  # List of local sources (docs or code)
+            "epub": [],  # List of epub sources
+            "jupyter": [],  # List of Jupyter notebook sources
+            "html": [],  # List of local HTML sources
+            "openapi": [],  # List of OpenAPI/Swagger spec sources
+            "asciidoc": [],  # List of AsciiDoc sources
+            "pptx": [],  # List of PowerPoint sources
+            "confluence": [],  # List of Confluence wiki sources
+            "notion": [],  # List of Notion page sources
+            "rss": [],  # List of RSS/Atom feed sources
+            "manpage": [],  # List of man page sources
+            "chat": [],  # List of Slack/Discord chat sources
        }

        # Track source index for unique naming (multi-source support)
@@ -86,6 +97,17 @@ class UnifiedScraper:
            "word": 0,
            "video": 0,
            "local": 0,
+            "epub": 0,
+            "jupyter": 0,
+            "html": 0,
+            "openapi": 0,
+            "asciidoc": 0,
+            "pptx": 0,
+            "confluence": 0,
+            "notion": 0,
+            "rss": 0,
+            "manpage": 0,
+            "chat": 0,
        }

        # Output paths - cleaner organization
@@ -166,6 +188,28 @@ class UnifiedScraper:
                    self._scrape_video(source)
                elif source_type == "local":
                    self._scrape_local(source)
+                elif source_type == "epub":
+                    self._scrape_epub(source)
+                elif source_type == "jupyter":
+                    self._scrape_jupyter(source)
+                elif source_type == "html":
+                    self._scrape_html(source)
+                elif source_type == "openapi":
+                    self._scrape_openapi(source)
+                elif source_type == "asciidoc":
+                    self._scrape_asciidoc(source)
+                elif source_type == "pptx":
+                    self._scrape_pptx(source)
+                elif source_type == "confluence":
+                    self._scrape_confluence(source)
+                elif source_type == "notion":
+                    self._scrape_notion(source)
+                elif source_type == "rss":
+                    self._scrape_rss(source)
+                elif source_type == "manpage":
+                    self._scrape_manpage(source)
+                elif source_type == "chat":
+                    self._scrape_chat(source)
                else:
                    logger.warning(f"Unknown source type: {source_type}")
            except Exception as e:
@@ -571,6 +615,7 @@ class UnifiedScraper:
            {
                "docx_path": docx_path,
                "docx_id": docx_id,
+                "word_id": docx_id,  # Alias for generic reference generation
                "idx": idx,
                "data": word_data,
                "data_file": cache_word_data,
@@ -788,6 +833,595 @@ class UnifiedScraper:
            logger.debug(f"Traceback: {traceback.format_exc()}")
            raise

+    # ------------------------------------------------------------------
+    # New source type handlers (v3.2.0+)
+    # ------------------------------------------------------------------
+
+    def _scrape_epub(self, source: dict[str, Any]):
+        """Scrape EPUB e-book (.epub)."""
+        try:
+            from skill_seekers.cli.epub_scraper import EpubToSkillConverter
+        except ImportError:
+            logger.error(
+                "EPUB scraper dependencies not installed.\n"
+                "  Install with: pip install skill-seekers[epub]"
+            )
+            return
+
+        idx = self._source_counters["epub"]
+        self._source_counters["epub"] += 1
+
+        epub_path = source["path"]
+        epub_id = os.path.splitext(os.path.basename(epub_path))[0]
+
+        epub_config = {
+            "name": f"{self.name}_epub_{idx}_{epub_id}",
+            "epub_path": source["path"],
+            "description": source.get("description", f"{epub_id} e-book"),
+        }
+
+        logger.info(f"Scraping EPUB: {source['path']}")
+        converter = EpubToSkillConverter(epub_config)
+        converter.extract_epub()
+
+        epub_data_file = converter.data_file
+        with open(epub_data_file, encoding="utf-8") as f:
+            epub_data = json.load(f)
+
+        cache_epub_data = os.path.join(self.data_dir, f"epub_data_{idx}_{epub_id}.json")
+        shutil.copy(epub_data_file, cache_epub_data)
+
+        self.scraped_data["epub"].append(
+            {
+                "epub_path": epub_path,
+                "epub_id": epub_id,
+                "idx": idx,
+                "data": epub_data,
+                "data_file": cache_epub_data,
+            }
+        )
+
+        try:
+            converter.build_skill()
+            logger.info("✅ EPUB: Standalone SKILL.md created")
+        except Exception as e:
+            logger.warning(f"⚠️  Failed to build standalone EPUB SKILL.md: {e}")
+
+        logger.info(f"✅ EPUB: {len(epub_data.get('chapters', []))} chapters extracted")
+
+    def _scrape_jupyter(self, source: dict[str, Any]):
+        """Scrape Jupyter Notebook (.ipynb)."""
+        try:
+            from skill_seekers.cli.jupyter_scraper import JupyterToSkillConverter
+        except ImportError:
+            logger.error(
+                "Jupyter scraper dependencies not installed.\n"
+                "  Install with: pip install skill-seekers[jupyter]"
+            )
+            return
+
+        idx = self._source_counters["jupyter"]
+        self._source_counters["jupyter"] += 1
+
+        nb_path = source["path"]
+        nb_id = os.path.splitext(os.path.basename(nb_path))[0]
+
+        nb_config = {
+            "name": f"{self.name}_jupyter_{idx}_{nb_id}",
+            "notebook_path": source["path"],
+            "description": source.get("description", f"{nb_id} notebook"),
+        }
+
+        logger.info(f"Scraping Jupyter Notebook: {source['path']}")
+        converter = JupyterToSkillConverter(nb_config)
+        converter.extract_notebook()
+
+        nb_data_file = converter.data_file
+        with open(nb_data_file, encoding="utf-8") as f:
+            nb_data = json.load(f)
+
+        cache_nb_data = os.path.join(self.data_dir, f"jupyter_data_{idx}_{nb_id}.json")
+        shutil.copy(nb_data_file, cache_nb_data)
+
+        self.scraped_data["jupyter"].append(
+            {
+                "notebook_path": nb_path,
+                "notebook_id": nb_id,
+                "idx": idx,
+                "data": nb_data,
+                "data_file": cache_nb_data,
+            }
+        )
+
+        try:
+            converter.build_skill()
+            logger.info("✅ Jupyter: Standalone SKILL.md created")
+        except Exception as e:
+            logger.warning(f"⚠️  Failed to build standalone Jupyter SKILL.md: {e}")
+
+        logger.info(f"✅ Jupyter: {len(nb_data.get('cells', []))} cells extracted")
+
+    def _scrape_html(self, source: dict[str, Any]):
+        """Scrape local HTML file(s)."""
+        try:
+            from skill_seekers.cli.html_scraper import HtmlToSkillConverter
+        except ImportError:
+            logger.error("html_scraper.py not found")
+            return
+
+        idx = self._source_counters["html"]
+        self._source_counters["html"] += 1
+
+        html_path = source["path"]
+        html_id = os.path.splitext(os.path.basename(html_path.rstrip("/")))[0]
+
+        html_config = {
+            "name": f"{self.name}_html_{idx}_{html_id}",
+            "html_path": source["path"],
+            "description": source.get("description", f"{html_id} HTML content"),
+        }
+
+        logger.info(f"Scraping local HTML: {source['path']}")
+        converter = HtmlToSkillConverter(html_config)
+        converter.extract_html()
+
+        html_data_file = converter.data_file
+        with open(html_data_file, encoding="utf-8") as f:
+            html_data = json.load(f)
+
+        cache_html_data = os.path.join(self.data_dir, f"html_data_{idx}_{html_id}.json")
+        shutil.copy(html_data_file, cache_html_data)
+
+        self.scraped_data["html"].append(
+            {
+                "html_path": html_path,
+                "html_id": html_id,
+                "idx": idx,
+                "data": html_data,
+                "data_file": cache_html_data,
+            }
+        )
+
+        try:
+            converter.build_skill()
+            logger.info("✅ HTML: Standalone SKILL.md created")
+        except Exception as e:
+            logger.warning(f"⚠️  Failed to build standalone HTML SKILL.md: {e}")
+
+        logger.info(f"✅ HTML: {len(html_data.get('pages', []))} pages extracted")
+
+    def _scrape_openapi(self, source: dict[str, Any]):
+        """Scrape OpenAPI/Swagger specification."""
+        try:
+            from skill_seekers.cli.openapi_scraper import OpenAPIToSkillConverter
+        except ImportError:
+            logger.error("openapi_scraper.py not found")
+            return
+
+        idx = self._source_counters["openapi"]
+        self._source_counters["openapi"] += 1
+
+        spec_path = source.get("path", source.get("url", ""))
+        spec_id = os.path.splitext(os.path.basename(spec_path))[0] if spec_path else f"spec_{idx}"
+
+        openapi_config = {
+            "name": f"{self.name}_openapi_{idx}_{spec_id}",
+            "spec_path": source.get("path"),
+            "spec_url": source.get("url"),
+            "description": source.get("description", f"{spec_id} API spec"),
+        }
+
+        logger.info(f"Scraping OpenAPI spec: {spec_path}")
+        converter = OpenAPIToSkillConverter(openapi_config)
+        converter.extract_spec()
+
+        api_data_file = converter.data_file
+        with open(api_data_file, encoding="utf-8") as f:
+            api_data = json.load(f)
+
+        cache_api_data = os.path.join(self.data_dir, f"openapi_data_{idx}_{spec_id}.json")
+        shutil.copy(api_data_file, cache_api_data)
+
+        self.scraped_data["openapi"].append(
+            {
+                "spec_path": spec_path,
+                "spec_id": spec_id,
+                "idx": idx,
+                "data": api_data,
+                "data_file": cache_api_data,
+            }
+        )
+
+        try:
+            converter.build_skill()
+            logger.info("✅ OpenAPI: Standalone SKILL.md created")
+        except Exception as e:
+            logger.warning(f"⚠️  Failed to build standalone OpenAPI SKILL.md: {e}")
+
+        logger.info(f"✅ OpenAPI: {len(api_data.get('endpoints', []))} endpoints extracted")
+
+    def _scrape_asciidoc(self, source: dict[str, Any]):
+        """Scrape AsciiDoc document(s)."""
+        try:
+            from skill_seekers.cli.asciidoc_scraper import AsciiDocToSkillConverter
+        except ImportError:
+            logger.error(
+                "AsciiDoc scraper dependencies not installed.\n"
+                "  Install with: pip install skill-seekers[asciidoc]"
+            )
+            return
+
+        idx = self._source_counters["asciidoc"]
+        self._source_counters["asciidoc"] += 1
+
+        adoc_path = source["path"]
+        adoc_id = os.path.splitext(os.path.basename(adoc_path.rstrip("/")))[0]
+
+        adoc_config = {
+            "name": f"{self.name}_asciidoc_{idx}_{adoc_id}",
+            "asciidoc_path": source["path"],
+            "description": source.get("description", f"{adoc_id} AsciiDoc content"),
+        }
+
+        logger.info(f"Scraping AsciiDoc: {source['path']}")
+        converter = AsciiDocToSkillConverter(adoc_config)
+        converter.extract_asciidoc()
+
+        adoc_data_file = converter.data_file
+        with open(adoc_data_file, encoding="utf-8") as f:
+            adoc_data = json.load(f)
+
+        cache_adoc_data = os.path.join(self.data_dir, f"asciidoc_data_{idx}_{adoc_id}.json")
+        shutil.copy(adoc_data_file, cache_adoc_data)
+
+        self.scraped_data["asciidoc"].append(
+            {
+                "asciidoc_path": adoc_path,
+                "asciidoc_id": adoc_id,
+                "idx": idx,
+                "data": adoc_data,
+                "data_file": cache_adoc_data,
+            }
+        )
+
+        try:
+            converter.build_skill()
+            logger.info("✅ AsciiDoc: Standalone SKILL.md created")
+        except Exception as e:
+            logger.warning(f"⚠️  Failed to build standalone AsciiDoc SKILL.md: {e}")
+
+        logger.info(f"✅ AsciiDoc: {len(adoc_data.get('sections', []))} sections extracted")
+
+    def _scrape_pptx(self, source: dict[str, Any]):
+        """Scrape PowerPoint presentation (.pptx)."""
+        try:
+            from skill_seekers.cli.pptx_scraper import PptxToSkillConverter
+        except ImportError:
+            logger.error(
+                "PowerPoint scraper dependencies not installed.\n"
+                "  Install with: pip install skill-seekers[pptx]"
+            )
+            return
+
+        idx = self._source_counters["pptx"]
+        self._source_counters["pptx"] += 1
+
+        pptx_path = source["path"]
+        pptx_id = os.path.splitext(os.path.basename(pptx_path))[0]
+
+        pptx_config = {
+            "name": f"{self.name}_pptx_{idx}_{pptx_id}",
+            "pptx_path": source["path"],
+            "description": source.get("description", f"{pptx_id} presentation"),
+        }
+
+        logger.info(f"Scraping PowerPoint: {source['path']}")
+        converter = PptxToSkillConverter(pptx_config)
+        converter.extract_pptx()
+
+        pptx_data_file = converter.data_file
+        with open(pptx_data_file, encoding="utf-8") as f:
+            pptx_data = json.load(f)
+
+        cache_pptx_data = os.path.join(self.data_dir, f"pptx_data_{idx}_{pptx_id}.json")
+        shutil.copy(pptx_data_file, cache_pptx_data)
+
+        self.scraped_data["pptx"].append(
+            {
+                "pptx_path": pptx_path,
+                "pptx_id": pptx_id,
+                "idx": idx,
+                "data": pptx_data,
+                "data_file": cache_pptx_data,
+            }
+        )
+
+        try:
+            converter.build_skill()
+            logger.info("✅ PowerPoint: Standalone SKILL.md created")
+        except Exception as e:
+            logger.warning(f"⚠️  Failed to build standalone PowerPoint SKILL.md: {e}")
+
+        logger.info(f"✅ PowerPoint: {len(pptx_data.get('slides', []))} slides extracted")
+
+    def _scrape_confluence(self, source: dict[str, Any]):
+        """Scrape Confluence wiki (API or exported HTML/XML)."""
+        try:
+            from skill_seekers.cli.confluence_scraper import ConfluenceToSkillConverter
+        except ImportError:
+            logger.error(
+                "Confluence scraper dependencies not installed.\n"
+                "  Install with: pip install skill-seekers[confluence]"
+            )
+            return
+
+        idx = self._source_counters["confluence"]
+        self._source_counters["confluence"] += 1
+
+        source_id = source.get("space_key", source.get("path", f"confluence_{idx}"))
+        if isinstance(source_id, str) and "/" in source_id:
+            source_id = os.path.basename(source_id.rstrip("/"))
+
+        conf_config = {
+            "name": f"{self.name}_confluence_{idx}_{source_id}",
+            "base_url": source.get("base_url", source.get("url")),
+            "space_key": source.get("space_key"),
+            "export_path": source.get("path"),
+            "username": source.get("username"),
+            "token": source.get("token"),
+            "description": source.get("description", f"{source_id} Confluence content"),
+            "max_pages": source.get("max_pages", 500),
+        }
+
+        logger.info(f"Scraping Confluence: {source_id}")
+        converter = ConfluenceToSkillConverter(conf_config)
+        converter.extract_confluence()
+
+        conf_data_file = converter.data_file
+        with open(conf_data_file, encoding="utf-8") as f:
+            conf_data = json.load(f)
+
+        cache_conf_data = os.path.join(self.data_dir, f"confluence_data_{idx}_{source_id}.json")
+        shutil.copy(conf_data_file, cache_conf_data)
+
+        self.scraped_data["confluence"].append(
+            {
+                "source_id": source_id,
+                "idx": idx,
+                "data": conf_data,
+                "data_file": cache_conf_data,
+            }
+        )
+
+        try:
+            converter.build_skill()
+            logger.info("✅ Confluence: Standalone SKILL.md created")
+        except Exception as e:
+            logger.warning(f"⚠️  Failed to build standalone Confluence SKILL.md: {e}")
+
+        logger.info(f"✅ Confluence: {len(conf_data.get('pages', []))} pages extracted")
+
+    def _scrape_notion(self, source: dict[str, Any]):
+        """Scrape Notion pages (API or exported Markdown)."""
+        try:
+            from skill_seekers.cli.notion_scraper import NotionToSkillConverter
+        except ImportError:
+            logger.error(
+                "Notion scraper dependencies not installed.\n"
+                "  Install with: pip install skill-seekers[notion]"
+            )
+            return
+
+        idx = self._source_counters["notion"]
+        self._source_counters["notion"] += 1
+
+        source_id = source.get(
+            "database_id", source.get("page_id", source.get("path", f"notion_{idx}"))
+        )
+        if isinstance(source_id, str) and "/" in source_id:
+            source_id = os.path.basename(source_id.rstrip("/"))
+
+        notion_config = {
+            "name": f"{self.name}_notion_{idx}_{source_id}",
+            "database_id": source.get("database_id"),
+            "page_id": source.get("page_id"),
+            "export_path": source.get("path"),
+            "token": source.get("token"),
+            "description": source.get("description", f"{source_id} Notion content"),
+            "max_pages": source.get("max_pages", 500),
+        }
+
+        logger.info(f"Scraping Notion: {source_id}")
+        converter = NotionToSkillConverter(notion_config)
+        converter.extract_notion()
+
+        notion_data_file = converter.data_file
+        with open(notion_data_file, encoding="utf-8") as f:
+            notion_data = json.load(f)
+
+        cache_notion_data = os.path.join(self.data_dir, f"notion_data_{idx}_{source_id}.json")
+        shutil.copy(notion_data_file, cache_notion_data)
+
+        self.scraped_data["notion"].append(
+            {
+                "source_id": source_id,
+                "idx": idx,
+                "data": notion_data,
+                "data_file": cache_notion_data,
+            }
+        )
+
+        try:
+            converter.build_skill()
+            logger.info("✅ Notion: Standalone SKILL.md created")
+        except Exception as e:
+            logger.warning(f"⚠️  Failed to build standalone Notion SKILL.md: {e}")
+
+        logger.info(f"✅ Notion: {len(notion_data.get('pages', []))} pages extracted")
+
+    def _scrape_rss(self, source: dict[str, Any]):
+        """Scrape RSS/Atom feed (with optional full article scraping)."""
+        try:
+            from skill_seekers.cli.rss_scraper import RssToSkillConverter
+        except ImportError:
+            logger.error(
+                "RSS scraper dependencies not installed.\n"
+                "  Install with: pip install skill-seekers[rss]"
+            )
+            return
+
+        idx = self._source_counters["rss"]
+        self._source_counters["rss"] += 1
+
+        feed_url = source.get("url", source.get("path", ""))
+        feed_id = feed_url.split("/")[-1].split(".")[0] if feed_url else f"feed_{idx}"
+
+        rss_config = {
+            "name": f"{self.name}_rss_{idx}_{feed_id}",
+            "feed_url": source.get("url"),
+            "feed_path": source.get("path"),
+            "follow_links": source.get("follow_links", True),
+            "max_articles": source.get("max_articles", 50),
+            "description": source.get("description", f"{feed_id} RSS/Atom feed"),
+        }
+
+        logger.info(f"Scraping RSS/Atom feed: {feed_url}")
+        converter = RssToSkillConverter(rss_config)
+        converter.extract_feed()
+
+        rss_data_file = converter.data_file
+        with open(rss_data_file, encoding="utf-8") as f:
+            rss_data = json.load(f)
+
+        cache_rss_data = os.path.join(self.data_dir, f"rss_data_{idx}_{feed_id}.json")
+        shutil.copy(rss_data_file, cache_rss_data)
+
+        self.scraped_data["rss"].append(
+            {
+                "feed_url": feed_url,
+                "feed_id": feed_id,
+                "idx": idx,
+                "data": rss_data,
+                "data_file": cache_rss_data,
+            }
+        )
+
+        try:
+            converter.build_skill()
+            logger.info("✅ RSS: Standalone SKILL.md created")
+        except Exception as e:
+            logger.warning(f"⚠️  Failed to build standalone RSS SKILL.md: {e}")
+
+        logger.info(f"✅ RSS: {len(rss_data.get('articles', []))} articles extracted")
+
+    def _scrape_manpage(self, source: dict[str, Any]):
+        """Scrape man page(s)."""
+        try:
+            from skill_seekers.cli.man_scraper import ManPageToSkillConverter
+        except ImportError:
+            logger.error("man_scraper.py not found")
+            return
+
+        idx = self._source_counters["manpage"]
+        self._source_counters["manpage"] += 1
+
+        man_names = source.get("names", [])
+        man_path = source.get("path", "")
+        man_id = man_names[0] if man_names else os.path.basename(man_path.rstrip("/"))
+
+        man_config = {
+            "name": f"{self.name}_manpage_{idx}_{man_id}",
+            "man_names": man_names,
+            "man_path": man_path,
+            "sections": source.get("sections", []),
+            "description": source.get("description", f"{man_id} man pages"),
+        }
+
+        logger.info(f"Scraping man pages: {man_id}")
+        converter = ManPageToSkillConverter(man_config)
+        converter.extract_manpages()
+
+        man_data_file = converter.data_file
+        with open(man_data_file, encoding="utf-8") as f:
+            man_data = json.load(f)
+
+        cache_man_data = os.path.join(self.data_dir, f"manpage_data_{idx}_{man_id}.json")
+        shutil.copy(man_data_file, cache_man_data)
+
+        self.scraped_data["manpage"].append(
+            {
+                "man_id": man_id,
+                "idx": idx,
+                "data": man_data,
+                "data_file": cache_man_data,
+            }
+        )
+
+        try:
+            converter.build_skill()
+            logger.info("✅ Man pages: Standalone SKILL.md created")
+        except Exception as e:
+            logger.warning(f"⚠️  Failed to build standalone man page SKILL.md: {e}")
+
+        logger.info(f"✅ Man pages: {len(man_data.get('pages', []))} man pages extracted")
+
+    def _scrape_chat(self, source: dict[str, Any]):
+        """Scrape Slack/Discord chat export or API."""
+        try:
+            from skill_seekers.cli.chat_scraper import ChatToSkillConverter
+        except ImportError:
+            logger.error(
+                "Chat scraper dependencies not installed.\n"
+                "  Install with: pip install skill-seekers[chat]"
+            )
+            return
+
+        idx = self._source_counters["chat"]
+        self._source_counters["chat"] += 1
+
+        export_path = source.get("path", "")
+        channel = source.get("channel", source.get("channel_id", ""))
+        chat_id = channel or os.path.basename(export_path.rstrip("/")) or f"chat_{idx}"
+
+        chat_config = {
+            "name": f"{self.name}_chat_{idx}_{chat_id}",
+            "export_path": source.get("path"),
+            "platform": source.get("platform", "slack"),
+            "token": source.get("token"),
+            "channel": channel,
+            "max_messages": source.get("max_messages", 10000),
+            "description": source.get("description", f"{chat_id} chat export"),
+        }
+
+        logger.info(f"Scraping chat: {chat_id}")
+        converter = ChatToSkillConverter(chat_config)
+        converter.extract_chat()
+
+        chat_data_file = converter.data_file
+        with open(chat_data_file, encoding="utf-8") as f:
+            chat_data = json.load(f)
+
+        cache_chat_data = os.path.join(self.data_dir, f"chat_data_{idx}_{chat_id}.json")
+        shutil.copy(chat_data_file, cache_chat_data)
+
+        self.scraped_data["chat"].append(
+            {
+                "chat_id": chat_id,
+                "platform": source.get("platform", "slack"),
+                "idx": idx,
+                "data": chat_data,
+                "data_file": cache_chat_data,
+            }
+        )
+
+        try:
+            converter.build_skill()
+            logger.info("✅ Chat: Standalone SKILL.md created")
+        except Exception as e:
+            logger.warning(f"⚠️  Failed to build standalone chat SKILL.md: {e}")
+
+        logger.info(f"✅ Chat: {len(chat_data.get('messages', []))} messages extracted")
+
    def _load_json(self, file_path: Path) -> dict:
        """
        Load JSON file safely.
@@ -1297,14 +1931,33 @@ Examples:
    if args.dry_run:
        logger.info("🔍 DRY RUN MODE - Preview only, no scraping will occur")
        logger.info(f"\nWould scrape {len(scraper.config.get('sources', []))} sources:")
+        # Source type display config: type -> (label, key for detail)
+        _SOURCE_DISPLAY = {
+            "documentation": ("Documentation", "base_url"),
+            "github": ("GitHub", "repo"),
+            "pdf": ("PDF", "path"),
+            "word": ("Word", "path"),
+            "epub": ("EPUB", "path"),
+            "video": ("Video", "url"),
+            "local": ("Local Codebase", "path"),
+            "jupyter": ("Jupyter Notebook", "path"),
+            "html": ("HTML", "path"),
+            "openapi": ("OpenAPI Spec", "path"),
+            "asciidoc": ("AsciiDoc", "path"),
+            "pptx": ("PowerPoint", "path"),
+            "confluence": ("Confluence", "base_url"),
+            "notion": ("Notion", "page_id"),
+            "rss": ("RSS/Atom Feed", "url"),
+            "manpage": ("Man Page", "names"),
+            "chat": ("Chat Export", "path"),
+        }
        for idx, source in enumerate(scraper.config.get("sources", []), 1):
            source_type = source.get("type", "unknown")
-            if source_type == "documentation":
-                logger.info(f"  {idx}. Documentation: {source.get('base_url', 'N/A')}")
-            elif source_type == "github":
-                logger.info(f"  {idx}. GitHub: {source.get('repo', 'N/A')}")
-            elif source_type == "pdf":
-                logger.info(f"  {idx}. PDF: {source.get('pdf_path', 'N/A')}")
+            label, key = _SOURCE_DISPLAY.get(source_type, (source_type.title(), "path"))
+            detail = source.get(key, "N/A")
+            if isinstance(detail, list):
+                detail = ", ".join(str(d) for d in detail)
+            logger.info(f"  {idx}. {label}: {detail}")
        logger.info(f"\nOutput directory: {scraper.output_dir}")
        logger.info(f"Merge mode: {scraper.merge_mode}")
        return
--- a/src/skill_seekers/cli/unified_skill_builder.py
+++ b/src/skill_seekers/cli/unified_skill_builder.py
@@ -136,6 +136,44 @@ class UnifiedSkillBuilder:
            skill_mds["pdf"] = "\n\n---\n\n".join(pdf_sources)
            logger.debug(f"Combined {len(pdf_sources)} PDF SKILL.md files")

+        # Load additional source types using generic glob pattern
+        # Each source type uses: {name}_{type}_{idx}_*/ or {name}_{type}_*/
+        _extra_types = [
+            "word",
+            "epub",
+            "video",
+            "jupyter",
+            "html",
+            "openapi",
+            "asciidoc",
+            "pptx",
+            "confluence",
+            "notion",
+            "rss",
+            "manpage",
+            "chat",
+        ]
+        for source_type in _extra_types:
+            type_sources = []
+            for type_dir in sources_dir.glob(f"{self.name}_{source_type}_*"):
+                type_skill_path = type_dir / "SKILL.md"
+                if type_skill_path.exists():
+                    try:
+                        content = type_skill_path.read_text(encoding="utf-8")
+                        type_sources.append(content)
+                        logger.debug(
+                            f"Loaded {source_type} SKILL.md from {type_dir.name} "
+                            f"({len(content)} chars)"
+                        )
+                    except OSError as e:
+                        logger.warning(
+                            f"Failed to read {source_type} SKILL.md from {type_dir.name}: {e}"
+                        )
+
+            if type_sources:
+                skill_mds[source_type] = "\n\n---\n\n".join(type_sources)
+                logger.debug(f"Combined {len(type_sources)} {source_type} SKILL.md files")
+
        logger.info(f"Loaded {len(skill_mds)} source SKILL.md files")
        return skill_mds

@@ -477,6 +515,18 @@ This skill synthesizes knowledge from multiple sources:
            logger.info("Using PDF SKILL.md as-is")
            content = skill_mds["pdf"]

+        # Generic merge for additional source types not covered by pairwise methods
+        if not content and skill_mds:
+            # At least one source SKILL.md exists but not docs/github/pdf
+            logger.info(f"Generic merge for source types: {list(skill_mds.keys())}")
+            content = self._generic_merge(skill_mds)
+        elif content and len(skill_mds) > (int(has_docs) + int(has_github) + int(has_pdf)):
+            # Pairwise synthesis handled the core types; append additional sources
+            extra_types = set(skill_mds.keys()) - {"documentation", "github", "pdf"}
+            if extra_types:
+                logger.info(f"Appending additional sources: {extra_types}")
+                content = self._append_extra_sources(content, skill_mds, extra_types)
+
        # Fallback: generate minimal SKILL.md (legacy behavior)
        if not content:
            logger.warning("No source SKILL.md files found, generating minimal SKILL.md (legacy)")
@@ -574,6 +624,165 @@ This skill synthesizes knowledge from multiple sources:

        return "\n".join(lines)

+    # ------------------------------------------------------------------
+    # Generic merge system for any combination of source types (v3.2.0+)
+    # ------------------------------------------------------------------
+
+    # Human-readable labels for source types
+    _SOURCE_LABELS: dict[str, str] = {
+        "documentation": "Documentation",
+        "github": "GitHub Repository",
+        "pdf": "PDF Document",
+        "word": "Word Document",
+        "epub": "EPUB E-book",
+        "video": "Video",
+        "local": "Local Codebase",
+        "jupyter": "Jupyter Notebook",
+        "html": "HTML Document",
+        "openapi": "OpenAPI/Swagger Spec",
+        "asciidoc": "AsciiDoc Document",
+        "pptx": "PowerPoint Presentation",
+        "confluence": "Confluence Wiki",
+        "notion": "Notion Page",
+        "rss": "RSS/Atom Feed",
+        "manpage": "Man Page",
+        "chat": "Chat Export",
+    }
+
+    def _generic_merge(self, skill_mds: dict[str, str]) -> str:
+        """Generic merge for any combination of source types.
+
+        Uses a priority-based section ordering approach:
+        1. Parse all source SKILL.md files into sections
+        2. Collect unique sections across all sources
+        3. Merge matching sections with source attribution
+        4. Produce a unified SKILL.md
+
+        This preserves the existing pairwise synthesis for docs+github, docs+pdf, etc.
+        and handles any other combination generically.
+
+        Args:
+            skill_mds: Dict mapping source type to SKILL.md content
+
+        Returns:
+            Merged SKILL.md content string
+        """
+        skill_name = self.name.lower().replace("_", "-").replace(" ", "-")[:64]
+        desc = self.description[:1024] if len(self.description) > 1024 else self.description
+
+        # Parse all source SKILL.md files into sections
+        all_sections: dict[str, dict[str, str]] = {}
+        for source_type, content in skill_mds.items():
+            all_sections[source_type] = self._parse_skill_md_sections(content)
+
+        # Determine all unique section names in priority order
+        # Sections that appear earlier in sources have higher priority
+        seen_sections: list[str] = []
+        for _source_type, sections in all_sections.items():
+            for section_name in sections:
+                if section_name not in seen_sections:
+                    seen_sections.append(section_name)
+
+        # Build merged content
+        source_labels = ", ".join(self._SOURCE_LABELS.get(t, t.title()) for t in skill_mds)
+        lines = [
+            "---",
+            f"name: {skill_name}",
+            f"description: {desc}",
+            "---",
+            "",
+            f"# {self.name.replace('_', ' ').title()}",
+            "",
+            f"{self.description}",
+            "",
+            f"*Merged from: {source_labels}*",
+            "",
+        ]
+
+        # Emit each section, merging content from all sources that have it
+        for section_name in seen_sections:
+            contributing_sources = [
+                (stype, sections[section_name])
+                for stype, sections in all_sections.items()
+                if section_name in sections
+            ]
+
+            if len(contributing_sources) == 1:
+                # Single source for this section — emit as-is
+                stype, content = contributing_sources[0]
+                label = self._SOURCE_LABELS.get(stype, stype.title())
+                lines.append(f"## {section_name}")
+                lines.append("")
+                lines.append(f"*From {label}*")
+                lines.append("")
+                lines.append(content)
+                lines.append("")
+            else:
+                # Multiple sources — merge with attribution
+                lines.append(f"## {section_name}")
+                lines.append("")
+                for stype, content in contributing_sources:
+                    label = self._SOURCE_LABELS.get(stype, stype.title())
+                    lines.append(f"### From {label}")
+                    lines.append("")
+                    lines.append(content)
+                    lines.append("")
+
+        lines.append("---")
+        lines.append("")
+        lines.append("*Generated by Skill Seeker's unified multi-source scraper*")
+
+        return "\n".join(lines)
+
+    def _append_extra_sources(
+        self,
+        base_content: str,
+        skill_mds: dict[str, str],
+        extra_types: set[str],
+    ) -> str:
+        """Append additional source content to existing pairwise-synthesized SKILL.md.
+
+        Used when the core docs+github+pdf synthesis has run, but there are
+        additional source types (epub, jupyter, etc.) that need to be included.
+
+        Args:
+            base_content: Already-synthesized SKILL.md content
+            skill_mds: All source SKILL.md files
+            extra_types: Set of extra source type keys to append
+
+        Returns:
+            Extended SKILL.md content
+        """
+        lines = base_content.split("\n")
+
+        # Find the final separator (---) or end of file
+        insertion_index = len(lines)
+        for i in range(len(lines) - 1, -1, -1):
+            if lines[i].strip() == "---":
+                insertion_index = i
+                break
+
+        # Build extra content
+        extra_lines = [""]
+        for source_type in sorted(extra_types):
+            if source_type not in skill_mds:
+                continue
+            label = self._SOURCE_LABELS.get(source_type, source_type.title())
+            sections = self._parse_skill_md_sections(skill_mds[source_type])
+
+            extra_lines.append(f"## {label} Content")
+            extra_lines.append("")
+
+            for section_name, content in sections.items():
+                extra_lines.append(f"### {section_name}")
+                extra_lines.append("")
+                extra_lines.append(content)
+                extra_lines.append("")
+
+        lines[insertion_index:insertion_index] = extra_lines
+
+        return "\n".join(lines)
+
    def _generate_minimal_skill_md(self) -> str:
        """Generate minimal SKILL.md (legacy fallback behavior).

@@ -597,18 +806,42 @@ This skill combines knowledge from multiple sources:

 """

+        # Source type display keys: type -> (label, primary_key, extra_keys)
+        _source_detail_map = {
+            "documentation": ("Documentation", "base_url", [("Pages", "max_pages", "unlimited")]),
+            "github": (
+                "GitHub Repository",
+                "repo",
+                [("Code Analysis", "code_analysis_depth", "surface"), ("Issues", "max_issues", 0)],
+            ),
+            "pdf": ("PDF Document", "path", []),
+            "word": ("Word Document", "path", []),
+            "epub": ("EPUB E-book", "path", []),
+            "video": ("Video", "url", []),
+            "local": ("Local Codebase", "path", [("Analysis Depth", "analysis_depth", "surface")]),
+            "jupyter": ("Jupyter Notebook", "path", []),
+            "html": ("HTML Document", "path", []),
+            "openapi": ("OpenAPI Spec", "path", []),
+            "asciidoc": ("AsciiDoc Document", "path", []),
+            "pptx": ("PowerPoint", "path", []),
+            "confluence": ("Confluence Wiki", "base_url", []),
+            "notion": ("Notion Page", "page_id", []),
+            "rss": ("RSS/Atom Feed", "url", []),
+            "manpage": ("Man Page", "names", []),
+            "chat": ("Chat Export", "path", []),
+        }
+
        # List sources
        for source in self.config.get("sources", []):
            source_type = source["type"]
-            if source_type == "documentation":
-                content += f"- ✅ **Documentation**: {source.get('base_url', 'N/A')}\n"
-                content += f"  - Pages: {source.get('max_pages', 'unlimited')}\n"
-            elif source_type == "github":
-                content += f"- ✅ **GitHub Repository**: {source.get('repo', 'N/A')}\n"
-                content += f"  - Code Analysis: {source.get('code_analysis_depth', 'surface')}\n"
-                content += f"  - Issues: {source.get('max_issues', 0)}\n"
-            elif source_type == "pdf":
-                content += f"- ✅ **PDF Document**: {source.get('path', 'N/A')}\n"
+            display = _source_detail_map.get(source_type, (source_type.title(), "path", []))
+            label, primary_key, extras = display
+            primary_val = source.get(primary_key, "N/A")
+            if isinstance(primary_val, list):
+                primary_val = ", ".join(str(v) for v in primary_val)
+            content += f"- ✅ **{label}**: {primary_val}\n"
+            for extra_label, extra_key, extra_default in extras:
+                content += f"  - {extra_label}: {source.get(extra_key, extra_default)}\n"

        # C3.x Architecture & Code Analysis section (if available)
        github_data = self.scraped_data.get("github", {})
@@ -796,6 +1029,27 @@ This skill combines knowledge from multiple sources:
        if pdf_list:
            self._generate_pdf_references(pdf_list)

+        # Generate references for all additional source types
+        _extra_source_types = [
+            "word",
+            "epub",
+            "video",
+            "jupyter",
+            "html",
+            "openapi",
+            "asciidoc",
+            "pptx",
+            "confluence",
+            "notion",
+            "rss",
+            "manpage",
+            "chat",
+        ]
+        for source_type in _extra_source_types:
+            source_list = self.scraped_data.get(source_type, [])
+            if source_list:
+                self._generate_generic_references(source_type, source_list)
+
        # Generate merged API reference if available
        if self.merged_data:
            self._generate_merged_api_reference()
@@ -977,6 +1231,63 @@ This skill combines knowledge from multiple sources:

        logger.info(f"Created PDF references ({len(pdf_list)} sources)")

+    def _generate_generic_references(self, source_type: str, source_list: list[dict]):
+        """Generate references for any source type using a generic approach.
+
+        Creates a references/<source_type>/ directory with an index and
+        copies any data files from the source list.
+
+        Args:
+            source_type: The source type key (e.g., 'epub', 'jupyter')
+            source_list: List of scraped source dicts for this type
+        """
+        if not source_list:
+            return
+
+        label = self._SOURCE_LABELS.get(source_type, source_type.title())
+        type_dir = os.path.join(self.skill_dir, "references", source_type)
+        os.makedirs(type_dir, exist_ok=True)
+
+        # Create index
+        index_path = os.path.join(type_dir, "index.md")
+        with open(index_path, "w", encoding="utf-8") as f:
+            f.write(f"# {label} References\n\n")
+            f.write(f"Reference from {len(source_list)} {label} source(s).\n\n")
+
+            for i, source_data in enumerate(source_list):
+                # Try common ID fields
+                source_id = (
+                    source_data.get("source_id")
+                    or source_data.get(f"{source_type}_id")
+                    or source_data.get("notebook_id")
+                    or source_data.get("spec_id")
+                    or source_data.get("feed_id")
+                    or source_data.get("man_id")
+                    or source_data.get("chat_id")
+                    or f"source_{i}"
+                )
+                f.write(f"## {source_id}\n\n")
+
+                # Write summary of extracted data
+                data = source_data.get("data", {})
+                if isinstance(data, dict):
+                    for key in ["title", "description", "metadata"]:
+                        if key in data:
+                            val = data[key]
+                            if isinstance(val, str) and val:
+                                f.write(f"**{key.title()}:** {val}\n\n")
+
+                # Copy data file if available
+                data_file = source_data.get("data_file")
+                if data_file and os.path.isfile(data_file):
+                    dest = os.path.join(type_dir, f"{source_id}_data.json")
+                    import contextlib
+
+                    with contextlib.suppress(OSError):
+                        shutil.copy(data_file, dest)
+
+        logger.info(f"Created {label} references ({len(source_list)} sources)")
+
    def _generate_merged_api_reference(self):
        """Generate merged API reference file."""
        api_dir = os.path.join(self.skill_dir, "references", "api")
--- a/src/skill_seekers/mcp/server_fastmcp.py
+++ b/src/skill_seekers/mcp/server_fastmcp.py
@@ -3,16 +3,16 @@
 Skill Seeker MCP Server (FastMCP Implementation)

 Modern, decorator-based MCP server using FastMCP for simplified tool registration.
-Provides 33 tools for generating Claude AI skills from documentation.
+Provides 34 tools for generating Claude AI skills from documentation.

 This is a streamlined alternative to server.py (2200 lines → 708 lines, 68% reduction).
 All tool implementations are delegated to modular tool files in tools/ directory.

 **Architecture:**
 - FastMCP server with decorator-based tool registration
- 33 tools organized into 7 categories:
+- 34 tools organized into 7 categories:
  * Config tools (3): generate_config, list_configs, validate_config
-  * Scraping tools (10): estimate_pages, scrape_docs, scrape_github, scrape_pdf, scrape_video, scrape_codebase, detect_patterns, extract_test_examples, build_how_to_guides, extract_config_patterns
+  * Scraping tools (11): estimate_pages, scrape_docs, scrape_github, scrape_pdf, scrape_video, scrape_codebase, detect_patterns, extract_test_examples, build_how_to_guides, extract_config_patterns, scrape_generic
  * Packaging tools (4): package_skill, upload_skill, enhance_skill, install_skill
  * Splitting tools (2): split_config, generate_router
  * Source tools (5): fetch_config, submit_config, add_config_source, list_config_sources, remove_config_source
@@ -97,6 +97,7 @@ try:
        remove_config_source_impl,
        scrape_codebase_impl,
        scrape_docs_impl,
+        scrape_generic_impl,
        scrape_github_impl,
        scrape_pdf_impl,
        scrape_video_impl,
@@ -141,6 +142,7 @@ except ImportError:
        remove_config_source_impl,
        scrape_codebase_impl,
        scrape_docs_impl,
+        scrape_generic_impl,
        scrape_github_impl,
        scrape_pdf_impl,
        scrape_video_impl,
@@ -301,7 +303,7 @@ async def sync_config(


 # ============================================================================
-# SCRAPING TOOLS (10 tools)
+# SCRAPING TOOLS (11 tools)
 # ============================================================================


@@ -823,6 +825,50 @@ async def extract_config_patterns(
    return str(result)


+@safe_tool_decorator(
+    description="Scrape content from new source types: jupyter, html, openapi, asciidoc, pptx, confluence, notion, rss, manpage, chat. A generic entry point that delegates to the appropriate CLI scraper module."
+)
+async def scrape_generic(
+    source_type: str,
+    name: str,
+    path: str | None = None,
+    url: str | None = None,
+) -> str:
+    """
+    Scrape content from various source types and build a skill.
+
+    A generic scraper that supports 10 new source types. It delegates to the
+    corresponding CLI scraper module (e.g., skill_seekers.cli.jupyter_scraper).
+
+    File-based types (jupyter, html, openapi, asciidoc, pptx, manpage, chat)
+    typically use the 'path' parameter. URL-based types (confluence, notion, rss)
+    typically use the 'url' parameter.
+
+    Args:
+        source_type: Source type to scrape. One of: jupyter, html, openapi,
+            asciidoc, pptx, confluence, notion, rss, manpage, chat.
+        name: Skill name for the output
+        path: File or directory path (for file-based sources like jupyter, html, pptx)
+        url: URL (for URL-based sources like confluence, notion, rss)
+
+    Returns:
+        Scraping results with file paths and statistics.
+    """
+    args = {
+        "source_type": source_type,
+        "name": name,
+    }
+    if path:
+        args["path"] = path
+    if url:
+        args["url"] = url
+
+    result = await scrape_generic_impl(args)
+    if isinstance(result, list) and result:
+        return result[0].text if hasattr(result[0], "text") else str(result[0])
+    return str(result)
+
+
 # ============================================================================
 # PACKAGING TOOLS (4 tools)
 # ============================================================================
--- a/src/skill_seekers/mcp/tools/init.py
+++ b/src/skill_seekers/mcp/tools/init.py
@@ -63,6 +63,9 @@ from .scraping_tools import (
 from .scraping_tools import (
    scrape_pdf_tool as scrape_pdf_impl,
 )
+from .scraping_tools import (
+    scrape_generic_tool as scrape_generic_impl,
+)
 from .scraping_tools import (
    scrape_video_tool as scrape_video_impl,
 )
@@ -135,6 +138,7 @@ __all__ = [
    "extract_test_examples_impl",
    "build_how_to_guides_impl",
    "extract_config_patterns_impl",
+    "scrape_generic_impl",
    # Packaging tools
    "package_skill_impl",
    "upload_skill_impl",
--- a/src/skill_seekers/mcp/tools/config_tools.py
+++ b/src/skill_seekers/mcp/tools/config_tools.py
@@ -205,6 +205,18 @@ async def validate_config(args: dict) -> list[TextContent]:
                        )
                    elif source["type"] == "pdf":
                        result += f"    Path: {source.get('path', 'N/A')}\n"
+                    elif source["type"] in (
+                        "jupyter",
+                        "html",
+                        "openapi",
+                        "asciidoc",
+                        "pptx",
+                        "manpage",
+                        "chat",
+                    ):
+                        result += f"    Path: {source.get('path', 'N/A')}\n"
+                    elif source["type"] in ("confluence", "notion", "rss"):
+                        result += f"    URL: {source.get('url', 'N/A')}\n"

                # Show merge settings if applicable
                if validator.needs_api_merge():
--- a/src/skill_seekers/mcp/tools/scraping_tools.py
+++ b/src/skill_seekers/mcp/tools/scraping_tools.py
@@ -7,6 +7,8 @@ This module contains all scraping-related MCP tool implementations:
 - scrape_github_tool: Scrape GitHub repositories
 - scrape_pdf_tool: Scrape PDF documentation
 - scrape_codebase_tool: Analyze local codebase and extract code knowledge
+- scrape_generic_tool: Generic scraper for new source types (jupyter, html,
+  openapi, asciidoc, pptx, confluence, notion, rss, manpage, chat)

 Extracted from server.py for better modularity and organization.
 """
@@ -1005,3 +1007,155 @@ async def extract_config_patterns_tool(args: dict) -> list[TextContent]:
        return [TextContent(type="text", text=output_text)]
    else:
        return [TextContent(type="text", text=f"{output_text}\n\n❌ Error:\n{stderr}")]
+
+
+# Valid source types for the generic scraper
+GENERIC_SOURCE_TYPES = (
+    "jupyter",
+    "html",
+    "openapi",
+    "asciidoc",
+    "pptx",
+    "confluence",
+    "notion",
+    "rss",
+    "manpage",
+    "chat",
+)
+
+# Mapping from source type to the CLI flag used for the primary input argument.
+# URL-based types use --url; file/path-based types use --path.
+_URL_BASED_TYPES = {"confluence", "notion", "rss"}
+
+# Friendly emoji labels per source type
+_SOURCE_EMOJIS = {
+    "jupyter": "📓",
+    "html": "🌐",
+    "openapi": "📡",
+    "asciidoc": "📄",
+    "pptx": "📊",
+    "confluence": "🏢",
+    "notion": "📝",
+    "rss": "📰",
+    "manpage": "📖",
+    "chat": "💬",
+}
+
+
+async def scrape_generic_tool(args: dict) -> list[TextContent]:
+    """
+    Generic scraper for new source types.
+
+    Handles all 10 new source types by building the appropriate subprocess
+    command and delegating to the corresponding CLI scraper module.
+
+    Supported source types: jupyter, html, openapi, asciidoc, pptx,
+    confluence, notion, rss, manpage, chat.
+
+    Args:
+        args: Dictionary containing:
+            - source_type (str): One of the supported source types
+            - path (str, optional): File or directory path (for file-based sources)
+            - url (str, optional): URL (for URL-based sources like confluence, notion, rss)
+            - name (str): Skill name for the output
+
+    Returns:
+        List[TextContent]: Tool execution results
+    """
+    source_type = args.get("source_type", "")
+    path = args.get("path")
+    url = args.get("url")
+    name = args.get("name")
+
+    # Validate source_type
+    if source_type not in GENERIC_SOURCE_TYPES:
+        return [
+            TextContent(
+                type="text",
+                text=(
+                    f"❌ Error: Unknown source_type '{source_type}'. "
+                    f"Must be one of: {', '.join(GENERIC_SOURCE_TYPES)}"
+                ),
+            )
+        ]
+
+    # Validate that we have either path or url
+    if not path and not url:
+        return [
+            TextContent(
+                type="text",
+                text="❌ Error: Must specify either 'path' (file/directory) or 'url'",
+            )
+        ]
+
+    if not name:
+        return [
+            TextContent(
+                type="text",
+                text="❌ Error: 'name' parameter is required",
+            )
+        ]
+
+    # Build the subprocess command
+    # Map source type to module name (most are <type>_scraper, but some differ)
+    _MODULE_NAMES = {
+        "manpage": "man_scraper",
+    }
+    module_name = _MODULE_NAMES.get(source_type, f"{source_type}_scraper")
+    cmd = [sys.executable, "-m", f"skill_seekers.cli.{module_name}"]
+
+    # Map source type to the correct CLI flag for file/path input and URL input.
+    # Each scraper has its own flag name — using a generic --path or --url would fail.
+    _PATH_FLAGS: dict[str, str] = {
+        "jupyter": "--notebook",
+        "html": "--html-path",
+        "openapi": "--spec",
+        "asciidoc": "--asciidoc-path",
+        "pptx": "--pptx",
+        "manpage": "--man-path",
+        "confluence": "--export-path",
+        "notion": "--export-path",
+        "rss": "--feed-path",
+        "chat": "--export-path",
+    }
+    _URL_FLAGS: dict[str, str] = {
+        "confluence": "--base-url",
+        "notion": "--page-id",
+        "rss": "--feed-url",
+        "openapi": "--spec-url",
+    }
+
+    # Determine the input flag based on source type
+    if source_type in _URL_BASED_TYPES and url:
+        url_flag = _URL_FLAGS.get(source_type, "--url")
+        cmd.extend([url_flag, url])
+    elif path:
+        path_flag = _PATH_FLAGS.get(source_type, "--path")
+        cmd.extend([path_flag, path])
+    elif url:
+        # Allow url fallback for file-based types (some may accept URLs too)
+        url_flag = _URL_FLAGS.get(source_type, "--url")
+        cmd.extend([url_flag, url])
+
+    cmd.extend(["--name", name])
+
+    # Set a reasonable timeout
+    timeout = 600  # 10 minutes
+
+    emoji = _SOURCE_EMOJIS.get(source_type, "🔧")
+    progress_msg = f"{emoji} Scraping {source_type} source...\n"
+    if path:
+        progress_msg += f"📁 Path: {path}\n"
+    if url:
+        progress_msg += f"🔗 URL: {url}\n"
+    progress_msg += f"📛 Name: {name}\n"
+    progress_msg += f"⏱️ Maximum time: {timeout // 60} minutes\n\n"
+
+    stdout, stderr, returncode = run_subprocess_with_streaming(cmd, timeout=timeout)
+
+    output = progress_msg + stdout
+
+    if returncode == 0:
+        return [TextContent(type="text", text=output)]
+    else:
+        return [TextContent(type="text", text=f"{output}\n\n❌ Error:\n{stderr}")]
--- a/src/skill_seekers/mcp/tools/splitting_tools.py
+++ b/src/skill_seekers/mcp/tools/splitting_tools.py
@@ -106,7 +106,9 @@ async def split_config(args: dict) -> list[TextContent]:

    Supports both documentation and unified (multi-source) configs:
    - Documentation configs: Split by categories, size, or create router skills
-    - Unified configs: Split by source type (documentation, github, pdf)
+    - Unified configs: Split by source type (documentation, github, pdf,
+      jupyter, html, openapi, asciidoc, pptx, confluence, notion, rss,
+      manpage, chat)

    For large documentation sites (10K+ pages), this tool splits the config into
    multiple smaller configs. For unified configs with multiple sources, splits
--- a/src/skill_seekers/workflows/complex-merge.yaml
+++ b/src/skill_seekers/workflows/complex-merge.yaml
@@ -0,0 +1,222 @@
+name: complex-merge
+description: Intelligent multi-source merging with conflict resolution, priority rules, and gap analysis
+version: "1.0"
+author: Skill Seekers
+tags:
+  - merge
+  - multi-source
+  - conflict-resolution
+  - synthesis
+applies_to:
+  - doc_scraping
+  - codebase_analysis
+  - github_analysis
+variables:
+  merge_strategy: priority
+  source_priority_order: "official_docs,code,community"
+  conflict_resolution: highest_priority
+  min_sources_for_consensus: 2
+stages:
+  - name: source_inventory
+    type: custom
+    target: inventory
+    uses_history: false
+    enabled: true
+    prompt: >
+      Catalog every source that contributed content to this skill extraction.
+      For each source, classify its type and assess its characteristics.
+
+      For each source, determine:
+      1. Source type (official_docs, codebase, github_repo, pdf, video, community, blog)
+      2. Content scope — what topics or areas does this source cover?
+      3. Freshness — how recent is the content? Look for version numbers, dates, deprecation notices
+      4. Authority level — is this an official maintainer, core contributor, or third party?
+      5. Content density — roughly how much substantive information does this source provide?
+      6. Format characteristics — prose, code samples, API reference, tutorial, etc.
+
+      Output JSON with:
+      - "sources": array of {id, type, scope_summary, topics_covered, freshness_estimate, authority, density, format}
+      - "source_type_distribution": count of sources by type
+      - "total_topics_identified": number of unique topics across all sources
+      - "coverage_summary": brief overview of what the combined sources cover
+
+  - name: cross_reference
+    type: custom
+    target: cross_references
+    uses_history: true
+    enabled: true
+    prompt: >
+      Using the source inventory, identify overlapping topics across sources.
+      Find where multiple sources discuss the same concept, API, feature, or pattern.
+
+      For each overlapping topic:
+      1. List which sources cover it and how deeply
+      2. Note whether sources agree, complement each other, or diverge
+      3. Identify the richest source for that topic (most detail, best examples)
+      4. Flag any terminology differences across sources for the same concept
+
+      Output JSON with:
+      - "overlapping_topics": array of {topic, sources_covering, agreement_level, richest_source, terminology_variants}
+      - "high_overlap_topics": topics covered by 3+ sources
+      - "complementary_pairs": pairs of sources that cover different aspects of the same topic well
+      - "terminology_map": dictionary mapping variant terms to a canonical term
+
+  - name: conflict_detection
+    type: custom
+    target: conflicts
+    uses_history: true
+    enabled: true
+    prompt: >
+      Examine the cross-referenced topics and identify genuine contradictions
+      between sources. Distinguish between true conflicts and superficial differences.
+
+      Categories of conflict to detect:
+      1. Factual contradictions — sources state opposite things about the same feature
+      2. Version mismatches — sources describe different versions of an API or behavior
+      3. Best practice disagreements — sources recommend conflicting approaches
+      4. Deprecated vs current — one source shows deprecated usage another shows current
+      5. Scope conflicts — sources disagree on what a feature can or cannot do
+
+      For each conflict:
+      - Identify the specific claim from each source
+      - Assess which source is more likely correct and why
+      - Recommend a resolution strategy
+
+      Output JSON with:
+      - "conflicts": array of {topic, type, source_a_claim, source_b_claim, likely_correct, resolution_rationale}
+      - "conflict_count_by_type": breakdown of conflicts by category
+      - "high_severity_conflicts": conflicts that would mislead users if unresolved
+      - "auto_resolvable": conflicts that can be resolved by version/date alone
+
+  - name: priority_merge
+    type: custom
+    target: merged_content
+    uses_history: true
+    enabled: true
+    prompt: >
+      Merge content from all sources using the following priority hierarchy:
+        1. Official documentation (highest authority)
+        2. Source code and inline comments (ground truth for behavior)
+        3. Community content — tutorials, blog posts, Stack Overflow (practical usage)
+
+      Merging rules:
+      - When sources agree, combine the best explanation with the best examples
+      - When sources conflict, prefer the higher-priority source but note the alternative
+      - When only a lower-priority source covers a topic, include it but flag the authority level
+      - Preserve code examples from any source, annotating their origin
+      - Deduplicate content — do not repeat the same information from multiple sources
+      - Normalize terminology using the canonical terms from cross-referencing
+
+      For each merged topic, produce:
+      1. Authoritative explanation (from highest-priority source)
+      2. Practical examples (best available from any source)
+      3. Source attribution (which sources contributed)
+      4. Confidence level (high if official docs confirm, medium if code-only, low if community-only)
+
+      Output JSON with:
+      - "merged_topics": array of {topic, explanation, examples, sources_used, confidence, notes}
+      - "merge_decisions": array of {topic, decision, rationale} for non-trivial merges
+      - "source_contribution_stats": how much each source contributed to the final output
+
+  - name: gap_analysis
+    type: custom
+    target: gaps
+    uses_history: true
+    enabled: true
+    prompt: >
+      Analyse the merged content to identify gaps — topics or areas that are
+      underrepresented or missing entirely.
+
+      Identify:
+      1. Single-source topics — covered by only one source, making them fragile
+      2. Missing fundamentals — core concepts that should be documented but are not
+      3. Missing examples — topics explained in prose but lacking code samples
+      4. Missing edge cases — common error scenarios or limitations not documented
+      5. Broken references — topics that reference other topics not present in any source
+      6. Audience gaps — content assumes knowledge that is never introduced
+
+      For each gap, assess:
+      - Severity (critical, important, nice-to-have)
+      - Whether the gap can be inferred from existing content
+      - Suggested source type that would best fill this gap
+
+      Output JSON with:
+      - "single_source_topics": array of {topic, sole_source, risk_level}
+      - "missing_fundamentals": topics that should exist but do not
+      - "example_gaps": topics needing code examples
+      - "edge_case_gaps": undocumented error scenarios
+      - "broken_references": internal references with no target
+      - "gap_severity_summary": counts by severity level
+
+  - name: synthesis
+    type: custom
+    target: skill_md
+    uses_history: true
+    enabled: true
+    prompt: >
+      Create a unified, coherent narrative from the merged content. The output
+      should read as if written by a single knowledgeable author, not as a
+      patchwork of multiple sources.
+
+      Synthesis guidelines:
+      1. Structure content logically — concepts build on each other
+      2. Lead with the most important information for each topic
+      3. Integrate code examples naturally within explanations
+      4. Use consistent voice, terminology, and formatting throughout
+      5. Add transition text between topics for narrative flow
+      6. Include a "Sources and Confidence" appendix noting where information came from
+      7. Mark any low-confidence or single-source claims with a caveat
+      8. Fill minor gaps by inference where safe to do so, clearly marking inferred content
+
+      Output JSON with:
+      - "synthesized_sections": array of {title, content, sources_used, confidence}
+      - "section_order": recommended reading order
+      - "inferred_content": content that was inferred rather than directly sourced
+      - "caveats": any warnings about content reliability
+
+  - name: quality_check
+    type: custom
+    target: quality
+    uses_history: true
+    enabled: true
+    prompt: >
+      Perform a final quality review of the synthesized output. Evaluate the
+      merge result against multiple quality dimensions.
+
+      Check for:
+      1. Completeness — does the output cover all topics from all sources?
+      2. Accuracy — are merged claims consistent and non-contradictory?
+      3. Coherence — does the document flow logically as a unified piece?
+      4. Attribution — are source contributions properly tracked?
+      5. Confidence calibration — are confidence levels appropriate?
+      6. Example quality — are code examples correct, runnable, and well-annotated?
+      7. Terminology consistency — is the canonical terminology used throughout?
+      8. Gap acknowledgment — are known gaps clearly communicated?
+
+      Scoring:
+      - Rate each dimension 1-10
+      - Provide specific issues found for any dimension scoring below 7
+      - Suggest concrete fixes for each issue
+
+      Output JSON with:
+      - "quality_scores": {completeness, accuracy, coherence, attribution, confidence_calibration, example_quality, terminology_consistency, gap_acknowledgment}
+      - "overall_score": weighted average (accuracy and completeness weighted 2x)
+      - "issues_found": array of {dimension, description, severity, suggested_fix}
+      - "merge_health": "excellent" | "good" | "needs_review" | "poor" based on overall score
+      - "recommendations": top 3 actions to improve merge quality
+
+post_process:
+  reorder_sections:
+    - overview
+    - core_concepts
+    - api_reference
+    - examples
+    - advanced_topics
+    - troubleshooting
+    - sources_and_confidence
+  add_metadata:
+    enhanced: true
+    workflow: complex-merge
+    multi_source: true
+    conflict_resolution: priority
+    quality_checked: true
--- a/tests/test_cli_parsers.py
+++ b/tests/test_cli_parsers.py
@@ -24,12 +24,12 @@ class TestParserRegistry:

    def test_all_parsers_registered(self):
        """Test that all parsers are registered."""
-        assert len(PARSERS) == 25, f"Expected 25 parsers, got {len(PARSERS)}"
+        assert len(PARSERS) == 35, f"Expected 35 parsers, got {len(PARSERS)}"

    def test_get_parser_names(self):
        """Test getting list of parser names."""
        names = get_parser_names()
-        assert len(names) == 25
+        assert len(names) == 35
        assert "scrape" in names
        assert "github" in names
        assert "package" in names
@@ -243,9 +243,9 @@ class TestBackwardCompatibility:
            assert cmd in names, f"Command '{cmd}' not found in parser registry!"

    def test_command_count_matches(self):
-        """Test that we have exactly 25 commands (includes create, workflows, word, epub, video, and sync-config)."""
-        assert len(PARSERS) == 25
-        assert len(get_parser_names()) == 25
+        """Test that we have exactly 35 commands (25 original + 10 new source types)."""
+        assert len(PARSERS) == 35
+        assert len(get_parser_names()) == 35


 if __name__ == "__main__":
--- a/tests/test_new_source_types.py
+++ b/tests/test_new_source_types.py
@@ -0,0 +1,824 @@
+#!/usr/bin/env python3
+"""
+Tests for v3.2.0 new source type integration points.
+
+Covers source detection, config validation, generic merge, CLI wiring,
+and source validation for the 10 new source types: jupyter, html, openapi,
+asciidoc, pptx, rss, manpage, confluence, notion, chat.
+"""
+
+import os
+import textwrap
+
+import pytest
+
+from skill_seekers.cli.config_validator import ConfigValidator
+from skill_seekers.cli.main import COMMAND_MODULES
+from skill_seekers.cli.parsers import PARSERS, get_parser_names
+from skill_seekers.cli.source_detector import SourceDetector, SourceInfo
+from skill_seekers.cli.unified_skill_builder import UnifiedSkillBuilder
+
+
+# ---------------------------------------------------------------------------
+# 1. SourceDetector — new type detection
+# ---------------------------------------------------------------------------
+
+
+class TestSourceDetectorNewTypes:
+    """Test that SourceDetector.detect() maps new extensions to correct types."""
+
+    # -- Jupyter --
+    def test_detect_ipynb(self):
+        """Test .ipynb → jupyter detection."""
+        info = SourceDetector.detect("analysis.ipynb")
+        assert info.type == "jupyter"
+        assert info.parsed["file_path"] == "analysis.ipynb"
+        assert info.suggested_name == "analysis"
+
+    # -- HTML --
+    def test_detect_html_extension(self):
+        """Test .html → html detection."""
+        info = SourceDetector.detect("page.html")
+        assert info.type == "html"
+        assert info.parsed["file_path"] == "page.html"
+
+    def test_detect_htm_extension(self):
+        """Test .htm → html detection."""
+        info = SourceDetector.detect("index.HTM")
+        assert info.type == "html"
+        assert info.parsed["file_path"] == "index.HTM"
+
+    # -- PowerPoint --
+    def test_detect_pptx(self):
+        """Test .pptx → pptx detection."""
+        info = SourceDetector.detect("slides.pptx")
+        assert info.type == "pptx"
+        assert info.parsed["file_path"] == "slides.pptx"
+        assert info.suggested_name == "slides"
+
+    # -- AsciiDoc --
+    def test_detect_adoc(self):
+        """Test .adoc → asciidoc detection."""
+        info = SourceDetector.detect("manual.adoc")
+        assert info.type == "asciidoc"
+        assert info.parsed["file_path"] == "manual.adoc"
+
+    def test_detect_asciidoc_extension(self):
+        """Test .asciidoc → asciidoc detection."""
+        info = SourceDetector.detect("guide.ASCIIDOC")
+        assert info.type == "asciidoc"
+        assert info.parsed["file_path"] == "guide.ASCIIDOC"
+
+    # -- Man pages --
+    def test_detect_man_extension(self):
+        """Test .man → manpage detection."""
+        info = SourceDetector.detect("curl.man")
+        assert info.type == "manpage"
+        assert info.parsed["file_path"] == "curl.man"
+
+    @pytest.mark.parametrize("section", range(1, 9))
+    def test_detect_man_sections(self, section):
+        """Test .1 through .8 → manpage for simple basenames."""
+        filename = f"git.{section}"
+        info = SourceDetector.detect(filename)
+        assert info.type == "manpage", f"{filename} should detect as manpage"
+        assert info.suggested_name == "git"
+
+    def test_man_section_with_dotted_basename_not_detected(self):
+        """Test that 'access.log.1' is NOT detected as a man page.
+
+        The heuristic checks that the basename (without extension) has no dots.
+        """
+        # This should fall through to web/domain detection (has a dot, not a path)
+        info = SourceDetector.detect("access.log.1")
+        # access.log.1 has a dot in the basename-without-ext ("access.log"),
+        # so it should NOT be detected as manpage.  It falls through to the
+        # domain inference branch because it contains a dot and doesn't start
+        # with '/'.
+        assert info.type != "manpage"
+
+    # -- RSS/Atom --
+    def test_detect_rss_extension(self):
+        """Test .rss → rss detection."""
+        info = SourceDetector.detect("feed.rss")
+        assert info.type == "rss"
+        assert info.parsed["file_path"] == "feed.rss"
+
+    def test_detect_atom_extension(self):
+        """Test .atom → rss detection."""
+        info = SourceDetector.detect("updates.atom")
+        assert info.type == "rss"
+        assert info.parsed["file_path"] == "updates.atom"
+
+    def test_xml_not_detected_as_rss(self):
+        """Test .xml is NOT detected as rss (too generic).
+
+        The fix ensures .xml files do not get incorrectly classified as RSS feeds.
+        """
+        # .xml has no special handling — it will fall through to domain inference
+        # or raise ValueError depending on contents.  Either way, it must not
+        # be classified as "rss".
+        info = SourceDetector.detect("data.xml")
+        assert info.type != "rss"
+
+    # -- OpenAPI --
+    def test_yaml_with_openapi_content_detected(self, tmp_path):
+        """Test .yaml with 'openapi:' key → openapi detection."""
+        spec = tmp_path / "petstore.yaml"
+        spec.write_text(
+            textwrap.dedent("""\
+                openapi: "3.0.0"
+                info:
+                  title: Petstore
+                  version: "1.0.0"
+                paths: {}
+            """)
+        )
+        info = SourceDetector.detect(str(spec))
+        assert info.type == "openapi"
+        assert info.parsed["file_path"] == str(spec)
+        assert info.suggested_name == "petstore"
+
+    def test_yaml_with_swagger_content_detected(self, tmp_path):
+        """Test .yaml with 'swagger:' key → openapi detection."""
+        spec = tmp_path / "legacy.yml"
+        spec.write_text(
+            textwrap.dedent("""\
+                swagger: "2.0"
+                info:
+                  title: Legacy API
+                basePath: /v1
+            """)
+        )
+        info = SourceDetector.detect(str(spec))
+        assert info.type == "openapi"
+
+    def test_yaml_without_openapi_not_detected(self, tmp_path):
+        """Test .yaml without OpenAPI content is NOT detected as openapi.
+
+        When the YAML file doesn't contain openapi/swagger keys the detector
+        skips OpenAPI and falls through.  For an absolute path it will raise
+        ValueError (cannot determine type), which still confirms it was NOT
+        classified as openapi.
+        """
+        plain = tmp_path / "config.yaml"
+        plain.write_text("name: my-project\nversion: 1.0\n")
+        # Absolute path falls through to ValueError (no matching type).
+        # Either way, it must NOT be "openapi".
+        try:
+            info = SourceDetector.detect(str(plain))
+            assert info.type != "openapi"
+        except ValueError:
+            # Raised because source type cannot be determined — this is fine,
+            # the important thing is it was not classified as openapi.
+            pass
+
+    def test_looks_like_openapi_returns_false_for_missing_file(self):
+        """Test _looks_like_openapi returns False for non-existent file."""
+        assert SourceDetector._looks_like_openapi("/nonexistent/spec.yaml") is False
+
+    def test_looks_like_openapi_json_key_format(self, tmp_path):
+        """Test _looks_like_openapi detects JSON-style keys (quoted)."""
+        spec = tmp_path / "api.yaml"
+        spec.write_text('"openapi": "3.0.0"\n')
+        assert SourceDetector._looks_like_openapi(str(spec)) is True
+
+
+# ---------------------------------------------------------------------------
+# 2. ConfigValidator — new source type validation
+# ---------------------------------------------------------------------------
+
+
+class TestConfigValidatorNewTypes:
+    """Test ConfigValidator VALID_SOURCE_TYPES and per-type validation."""
+
+    # All 17 expected types
+    EXPECTED_TYPES = {
+        "documentation",
+        "github",
+        "pdf",
+        "local",
+        "word",
+        "video",
+        "epub",
+        "jupyter",
+        "html",
+        "openapi",
+        "asciidoc",
+        "pptx",
+        "confluence",
+        "notion",
+        "rss",
+        "manpage",
+        "chat",
+    }
+
+    def test_all_17_types_present(self):
+        """Test that VALID_SOURCE_TYPES contains all 17 types."""
+        assert ConfigValidator.VALID_SOURCE_TYPES == self.EXPECTED_TYPES
+
+    def test_unknown_type_rejected(self):
+        """Test that an unknown source type is rejected during validation."""
+        config = {
+            "name": "test",
+            "description": "test",
+            "sources": [{"type": "foobar"}],
+        }
+        validator = ConfigValidator(config)
+        with pytest.raises(ValueError, match="Invalid type 'foobar'"):
+            validator.validate()
+
+    # --- Per-type required-field validation ---
+
+    def _make_config(self, source: dict) -> dict:
+        """Helper: wrap a source dict in a valid config structure."""
+        return {
+            "name": "test",
+            "description": "test",
+            "sources": [source],
+        }
+
+    def test_epub_requires_path(self):
+        """Test epub source validation requires 'path'."""
+        config = self._make_config({"type": "epub"})
+        validator = ConfigValidator(config)
+        with pytest.raises(ValueError, match="Missing required field 'path'"):
+            validator.validate()
+
+    def test_jupyter_requires_path(self):
+        """Test jupyter source validation requires 'path'."""
+        config = self._make_config({"type": "jupyter"})
+        validator = ConfigValidator(config)
+        with pytest.raises(ValueError, match="Missing required field 'path'"):
+            validator.validate()
+
+    def test_html_requires_path(self):
+        """Test html source validation requires 'path'."""
+        config = self._make_config({"type": "html"})
+        validator = ConfigValidator(config)
+        with pytest.raises(ValueError, match="Missing required field 'path'"):
+            validator.validate()
+
+    def test_openapi_requires_path_or_url(self):
+        """Test openapi source validation requires 'path' or 'url'."""
+        config = self._make_config({"type": "openapi"})
+        validator = ConfigValidator(config)
+        with pytest.raises(ValueError, match="Missing required field 'path' or 'url'"):
+            validator.validate()
+
+    def test_openapi_accepts_url(self):
+        """Test openapi source passes validation with 'url'."""
+        config = self._make_config({"type": "openapi", "url": "https://example.com/spec.yaml"})
+        validator = ConfigValidator(config)
+        assert validator.validate() is True
+
+    def test_pptx_requires_path(self):
+        """Test pptx source validation requires 'path'."""
+        config = self._make_config({"type": "pptx"})
+        validator = ConfigValidator(config)
+        with pytest.raises(ValueError, match="Missing required field 'path'"):
+            validator.validate()
+
+    def test_asciidoc_requires_path(self):
+        """Test asciidoc source validation requires 'path'."""
+        config = self._make_config({"type": "asciidoc"})
+        validator = ConfigValidator(config)
+        with pytest.raises(ValueError, match="Missing required field 'path'"):
+            validator.validate()
+
+    def test_confluence_requires_url_or_path(self):
+        """Test confluence requires 'url'/'base_url' or 'path'."""
+        config = self._make_config({"type": "confluence"})
+        validator = ConfigValidator(config)
+        with pytest.raises(ValueError, match="Missing required field"):
+            validator.validate()
+
+    def test_confluence_accepts_base_url(self):
+        """Test confluence passes with base_url + space_key."""
+        config = self._make_config(
+            {
+                "type": "confluence",
+                "base_url": "https://wiki.example.com",
+                "space_key": "DEV",
+            }
+        )
+        validator = ConfigValidator(config)
+        assert validator.validate() is True
+
+    def test_confluence_accepts_path(self):
+        """Test confluence passes with export path."""
+        config = self._make_config({"type": "confluence", "path": "/exports/wiki"})
+        validator = ConfigValidator(config)
+        assert validator.validate() is True
+
+    def test_notion_requires_url_or_path(self):
+        """Test notion requires 'url'/'database_id'/'page_id' or 'path'."""
+        config = self._make_config({"type": "notion"})
+        validator = ConfigValidator(config)
+        with pytest.raises(ValueError, match="Missing required field"):
+            validator.validate()
+
+    def test_notion_accepts_page_id(self):
+        """Test notion passes with page_id."""
+        config = self._make_config({"type": "notion", "page_id": "abc123"})
+        validator = ConfigValidator(config)
+        assert validator.validate() is True
+
+    def test_notion_accepts_database_id(self):
+        """Test notion passes with database_id."""
+        config = self._make_config({"type": "notion", "database_id": "db-456"})
+        validator = ConfigValidator(config)
+        assert validator.validate() is True
+
+    def test_rss_requires_url_or_path(self):
+        """Test rss source validation requires 'url' or 'path'."""
+        config = self._make_config({"type": "rss"})
+        validator = ConfigValidator(config)
+        with pytest.raises(ValueError, match="Missing required field 'url' or 'path'"):
+            validator.validate()
+
+    def test_rss_accepts_url(self):
+        """Test rss passes with url."""
+        config = self._make_config({"type": "rss", "url": "https://blog.example.com/feed.xml"})
+        validator = ConfigValidator(config)
+        assert validator.validate() is True
+
+    def test_manpage_requires_path_or_names(self):
+        """Test manpage source validation requires 'path' or 'names'."""
+        config = self._make_config({"type": "manpage"})
+        validator = ConfigValidator(config)
+        with pytest.raises(ValueError, match="Missing required field 'path' or 'names'"):
+            validator.validate()
+
+    def test_manpage_accepts_names(self):
+        """Test manpage passes with 'names' list."""
+        config = self._make_config({"type": "manpage", "names": ["git", "curl"]})
+        validator = ConfigValidator(config)
+        assert validator.validate() is True
+
+    def test_chat_requires_path_or_token(self):
+        """Test chat source validation requires 'path' or 'token'."""
+        config = self._make_config({"type": "chat"})
+        validator = ConfigValidator(config)
+        with pytest.raises(ValueError, match="Missing required field 'path'.*or 'token'"):
+            validator.validate()
+
+    def test_chat_accepts_path(self):
+        """Test chat passes with export path."""
+        config = self._make_config({"type": "chat", "path": "/exports/slack"})
+        validator = ConfigValidator(config)
+        assert validator.validate() is True
+
+    def test_chat_accepts_token_with_channel(self):
+        """Test chat passes with API token + channel."""
+        config = self._make_config(
+            {
+                "type": "chat",
+                "token": "xoxb-fake",
+                "channel": "#general",
+            }
+        )
+        validator = ConfigValidator(config)
+        assert validator.validate() is True
+
+
+# ---------------------------------------------------------------------------
+# 3. UnifiedSkillBuilder — generic merge system
+# ---------------------------------------------------------------------------
+
+
+class TestUnifiedSkillBuilderGenericMerge:
+    """Test _generic_merge, _append_extra_sources, and _SOURCE_LABELS."""
+
+    def _make_builder(self, tmp_path) -> UnifiedSkillBuilder:
+        """Create a minimal builder instance for testing."""
+        config = {
+            "name": "test_project",
+            "description": "A test project for merge testing",
+            "sources": [
+                {"type": "jupyter", "path": "nb.ipynb"},
+                {"type": "rss", "url": "https://example.com/feed.rss"},
+            ],
+        }
+        scraped_data: dict = {}
+        builder = UnifiedSkillBuilder(
+            config=config,
+            scraped_data=scraped_data,
+            cache_dir=str(tmp_path / "cache"),
+        )
+        # Override skill_dir to use tmp_path
+        builder.skill_dir = str(tmp_path / "output" / "test_project")
+        os.makedirs(builder.skill_dir, exist_ok=True)
+        os.makedirs(os.path.join(builder.skill_dir, "references"), exist_ok=True)
+        return builder
+
+    def test_generic_merge_produces_valid_markdown(self, tmp_path):
+        """Test _generic_merge with two source types produces markdown."""
+        builder = self._make_builder(tmp_path)
+        skill_mds = {
+            "jupyter": "## When to Use\n\nFor data analysis.\n\n## Quick Reference\n\nImport pandas.",
+            "rss": "## When to Use\n\nFor feed monitoring.\n\n## Feed Items\n\nLatest entries.",
+        }
+        result = builder._generic_merge(skill_mds)
+
+        # Must be non-empty markdown
+        assert len(result) > 100
+        # Must contain the project title
+        assert "Test Project" in result
+
+    def test_generic_merge_includes_yaml_frontmatter(self, tmp_path):
+        """Test _generic_merge includes YAML frontmatter."""
+        builder = self._make_builder(tmp_path)
+        skill_mds = {
+            "html": "## Overview\n\nHTML content here.",
+        }
+        result = builder._generic_merge(skill_mds)
+
+        assert result.startswith("---\n")
+        assert "name: test-project" in result
+        assert "description: A test project" in result
+
+    def test_generic_merge_attributes_content_to_sources(self, tmp_path):
+        """Test _generic_merge attributes content to correct source labels."""
+        builder = self._make_builder(tmp_path)
+        skill_mds = {
+            "jupyter": "## Overview\n\nNotebook content.",
+            "pptx": "## Overview\n\nSlide content.",
+        }
+        result = builder._generic_merge(skill_mds)
+
+        # Check source labels appear
+        assert "Jupyter Notebook" in result
+        assert "PowerPoint Presentation" in result
+
+    def test_generic_merge_single_source_section(self, tmp_path):
+        """Test section unique to one source has 'From <Label>' attribution."""
+        builder = self._make_builder(tmp_path)
+        skill_mds = {
+            "manpage": "## Synopsis\n\ngit [options]",
+        }
+        result = builder._generic_merge(skill_mds)
+
+        assert "*From Man Page*" in result
+        assert "## Synopsis" in result
+
+    def test_generic_merge_multi_source_section(self, tmp_path):
+        """Test section shared by multiple sources gets sub-headings per source."""
+        builder = self._make_builder(tmp_path)
+        skill_mds = {
+            "asciidoc": "## Quick Reference\n\nAsciiDoc quick ref.",
+            "html": "## Quick Reference\n\nHTML quick ref.",
+        }
+        result = builder._generic_merge(skill_mds)
+
+        # Both sources should be attributed under the shared section
+        assert "### From AsciiDoc Document" in result
+        assert "### From HTML Document" in result
+
+    def test_generic_merge_footer(self, tmp_path):
+        """Test _generic_merge ends with the standard footer."""
+        builder = self._make_builder(tmp_path)
+        skill_mds = {
+            "rss": "## Feeds\n\nSome feeds.",
+        }
+        result = builder._generic_merge(skill_mds)
+        assert "Generated by Skill Seeker" in result
+
+    def test_generic_merge_merged_from_line(self, tmp_path):
+        """Test _generic_merge includes 'Merged from:' with correct labels."""
+        builder = self._make_builder(tmp_path)
+        skill_mds = {
+            "confluence": "## Pages\n\nWiki pages.",
+            "notion": "## Databases\n\nNotion DBs.",
+        }
+        result = builder._generic_merge(skill_mds)
+
+        assert "*Merged from: Confluence Wiki, Notion Page*" in result
+
+    def test_append_extra_sources_adds_sections(self, tmp_path):
+        """Test _append_extra_sources adds new sections to base content."""
+        builder = self._make_builder(tmp_path)
+        base_content = "# Test\n\nIntro.\n\n## Main Section\n\nContent.\n\n---\n\n*Footer*\n"
+        skill_mds = {
+            "epub": "## Chapters\n\nChapter list.\n\n## Key Concepts\n\nConcept A.",
+        }
+        result = builder._append_extra_sources(base_content, skill_mds, {"epub"})
+
+        # The extra source content should be inserted before the footer separator
+        assert "EPUB E-book Content" in result
+        assert "Chapters" in result
+        assert "Key Concepts" in result
+        # Original content should still be present
+        assert "# Test" in result
+        assert "## Main Section" in result
+
+    def test_append_extra_sources_preserves_footer(self, tmp_path):
+        """Test _append_extra_sources keeps the footer intact."""
+        builder = self._make_builder(tmp_path)
+        base_content = "# Test\n\n---\n\n*Footer*\n"
+        skill_mds = {
+            "chat": "## Messages\n\nChat history.",
+        }
+        result = builder._append_extra_sources(base_content, skill_mds, {"chat"})
+
+        assert "*Footer*" in result
+
+    def test_source_labels_has_all_17_types(self):
+        """Test _SOURCE_LABELS has entries for all 17 source types."""
+        expected = {
+            "documentation",
+            "github",
+            "pdf",
+            "word",
+            "epub",
+            "video",
+            "local",
+            "jupyter",
+            "html",
+            "openapi",
+            "asciidoc",
+            "pptx",
+            "confluence",
+            "notion",
+            "rss",
+            "manpage",
+            "chat",
+        }
+        assert set(UnifiedSkillBuilder._SOURCE_LABELS.keys()) == expected
+
+    def test_source_labels_values_are_nonempty_strings(self):
+        """Test all _SOURCE_LABELS values are non-empty strings."""
+        for key, label in UnifiedSkillBuilder._SOURCE_LABELS.items():
+            assert isinstance(label, str), f"Label for '{key}' is not a string"
+            assert len(label) > 0, f"Label for '{key}' is empty"
+
+
+# ---------------------------------------------------------------------------
+# 4. COMMAND_MODULES and parser wiring
+# ---------------------------------------------------------------------------
+
+
+class TestCommandModules:
+    """Test that all 10 new source types are wired into CLI."""
+
+    NEW_COMMAND_NAMES = [
+        "jupyter",
+        "html",
+        "openapi",
+        "asciidoc",
+        "pptx",
+        "rss",
+        "manpage",
+        "confluence",
+        "notion",
+        "chat",
+    ]
+
+    def test_new_types_in_command_modules(self):
+        """Test all 10 new source types are in COMMAND_MODULES."""
+        for cmd in self.NEW_COMMAND_NAMES:
+            assert cmd in COMMAND_MODULES, f"'{cmd}' not in COMMAND_MODULES"
+
+    def test_command_modules_values_are_module_paths(self):
+        """Test COMMAND_MODULES values look like importable module paths."""
+        for cmd in self.NEW_COMMAND_NAMES:
+            module_path = COMMAND_MODULES[cmd]
+            assert module_path.startswith("skill_seekers.cli."), (
+                f"Module path for '{cmd}' doesn't start with 'skill_seekers.cli.'"
+            )
+
+    def test_new_parser_names_include_all_10(self):
+        """Test that get_parser_names() includes all 10 new source types."""
+        names = get_parser_names()
+        for cmd in self.NEW_COMMAND_NAMES:
+            assert cmd in names, f"Parser '{cmd}' not registered"
+
+    def test_total_parser_count(self):
+        """Test total PARSERS count is 35 (25 original + 10 new)."""
+        assert len(PARSERS) == 35
+
+    def test_no_duplicate_parser_names(self):
+        """Test no duplicate parser names exist."""
+        names = get_parser_names()
+        assert len(names) == len(set(names)), "Duplicate parser names found!"
+
+    def test_command_module_count(self):
+        """Test COMMAND_MODULES has expected number of entries."""
+        # 25 original + 10 new = 35
+        assert len(COMMAND_MODULES) == 35
+
+
+# ---------------------------------------------------------------------------
+# 5. SourceDetector.validate_source — new types
+# ---------------------------------------------------------------------------
+
+
+class TestSourceDetectorValidation:
+    """Test validate_source for new file-based source types."""
+
+    def test_validation_passes_for_existing_jupyter(self, tmp_path):
+        """Test validation passes for an existing .ipynb file."""
+        nb = tmp_path / "test.ipynb"
+        nb.write_text('{"cells": []}')
+
+        info = SourceInfo(
+            type="jupyter",
+            parsed={"file_path": str(nb)},
+            suggested_name="test",
+            raw_input=str(nb),
+        )
+        # Should not raise
+        SourceDetector.validate_source(info)
+
+    def test_validation_raises_for_nonexistent_jupyter(self):
+        """Test validation raises ValueError for non-existent file."""
+        info = SourceInfo(
+            type="jupyter",
+            parsed={"file_path": "/nonexistent/notebook.ipynb"},
+            suggested_name="notebook",
+            raw_input="/nonexistent/notebook.ipynb",
+        )
+        with pytest.raises(ValueError, match="does not exist"):
+            SourceDetector.validate_source(info)
+
+    def test_validation_passes_for_existing_html(self, tmp_path):
+        """Test validation passes for an existing .html file."""
+        html = tmp_path / "page.html"
+        html.write_text("<html></html>")
+
+        info = SourceInfo(
+            type="html",
+            parsed={"file_path": str(html)},
+            suggested_name="page",
+            raw_input=str(html),
+        )
+        SourceDetector.validate_source(info)
+
+    def test_validation_raises_for_nonexistent_pptx(self):
+        """Test validation raises ValueError for non-existent pptx."""
+        info = SourceInfo(
+            type="pptx",
+            parsed={"file_path": "/nonexistent/slides.pptx"},
+            suggested_name="slides",
+            raw_input="/nonexistent/slides.pptx",
+        )
+        with pytest.raises(ValueError, match="does not exist"):
+            SourceDetector.validate_source(info)
+
+    def test_validation_passes_for_existing_openapi(self, tmp_path):
+        """Test validation passes for an existing OpenAPI spec file."""
+        spec = tmp_path / "api.yaml"
+        spec.write_text("openapi: '3.0.0'\n")
+
+        info = SourceInfo(
+            type="openapi",
+            parsed={"file_path": str(spec)},
+            suggested_name="api",
+            raw_input=str(spec),
+        )
+        SourceDetector.validate_source(info)
+
+    def test_validation_raises_for_nonexistent_asciidoc(self):
+        """Test validation raises ValueError for non-existent asciidoc."""
+        info = SourceInfo(
+            type="asciidoc",
+            parsed={"file_path": "/nonexistent/doc.adoc"},
+            suggested_name="doc",
+            raw_input="/nonexistent/doc.adoc",
+        )
+        with pytest.raises(ValueError, match="does not exist"):
+            SourceDetector.validate_source(info)
+
+    def test_validation_raises_for_nonexistent_manpage(self):
+        """Test validation raises ValueError for non-existent manpage."""
+        info = SourceInfo(
+            type="manpage",
+            parsed={"file_path": "/nonexistent/git.1"},
+            suggested_name="git",
+            raw_input="/nonexistent/git.1",
+        )
+        with pytest.raises(ValueError, match="does not exist"):
+            SourceDetector.validate_source(info)
+
+    def test_validation_passes_for_existing_manpage(self, tmp_path):
+        """Test validation passes for an existing man page file."""
+        man = tmp_path / "curl.1"
+        man.write_text(".TH CURL 1\n")
+
+        info = SourceInfo(
+            type="manpage",
+            parsed={"file_path": str(man)},
+            suggested_name="curl",
+            raw_input=str(man),
+        )
+        SourceDetector.validate_source(info)
+
+    def test_rss_url_validation_no_file_check(self):
+        """Test rss validation passes for URL-based source (no file check)."""
+        info = SourceInfo(
+            type="rss",
+            parsed={"url": "https://example.com/feed.rss"},
+            suggested_name="feed",
+            raw_input="https://example.com/feed.rss",
+        )
+        # rss validation only checks file if file_path is present; URL should pass
+        SourceDetector.validate_source(info)
+
+    def test_rss_validation_raises_for_nonexistent_file(self):
+        """Test rss validation raises for non-existent local file."""
+        info = SourceInfo(
+            type="rss",
+            parsed={"file_path": "/nonexistent/feed.rss"},
+            suggested_name="feed",
+            raw_input="/nonexistent/feed.rss",
+        )
+        with pytest.raises(ValueError, match="does not exist"):
+            SourceDetector.validate_source(info)
+
+    def test_rss_validation_passes_for_existing_file(self, tmp_path):
+        """Test rss validation passes for an existing .rss file."""
+        rss = tmp_path / "feed.rss"
+        rss.write_text("<rss></rss>")
+
+        info = SourceInfo(
+            type="rss",
+            parsed={"file_path": str(rss)},
+            suggested_name="feed",
+            raw_input=str(rss),
+        )
+        SourceDetector.validate_source(info)
+
+    def test_validation_passes_for_directory_types(self, tmp_path):
+        """Test validation passes when source is a directory (e.g., html dir)."""
+        html_dir = tmp_path / "pages"
+        html_dir.mkdir()
+
+        info = SourceInfo(
+            type="html",
+            parsed={"file_path": str(html_dir)},
+            suggested_name="pages",
+            raw_input=str(html_dir),
+        )
+        # The validator allows directories for these types (isfile or isdir)
+        SourceDetector.validate_source(info)
+
+
+# ---------------------------------------------------------------------------
+# 6. CreateCommand._route_generic coverage
+# ---------------------------------------------------------------------------
+
+
+class TestCreateCommandRouting:
+    """Test that CreateCommand._route_to_scraper maps new types to _route_generic."""
+
+    # We can't easily call _route_to_scraper (it imports real scrapers),
+    # but we verify the routing table is correct by checking the method source.
+
+    GENERIC_ROUTES = {
+        "jupyter": ("jupyter_scraper", "--notebook"),
+        "html": ("html_scraper", "--html-path"),
+        "openapi": ("openapi_scraper", "--spec"),
+        "asciidoc": ("asciidoc_scraper", "--asciidoc-path"),
+        "pptx": ("pptx_scraper", "--pptx"),
+        "rss": ("rss_scraper", "--feed-path"),
+        "manpage": ("man_scraper", "--man-path"),
+        "confluence": ("confluence_scraper", "--export-path"),
+        "notion": ("notion_scraper", "--export-path"),
+        "chat": ("chat_scraper", "--export-path"),
+    }
+
+    def test_route_to_scraper_source_coverage(self):
+        """Test _route_to_scraper method handles all 10 new types.
+
+        We inspect the method source to verify each type has a branch.
+        """
+        import inspect
+
+        source = inspect.getsource(
+            __import__(
+                "skill_seekers.cli.create_command",
+                fromlist=["CreateCommand"],
+            ).CreateCommand._route_to_scraper
+        )
+        for source_type in self.GENERIC_ROUTES:
+            assert f'"{source_type}"' in source, (
+                f"_route_to_scraper missing branch for '{source_type}'"
+            )
+
+    def test_generic_route_module_names(self):
+        """Test _route_generic is called with correct module names."""
+        import inspect
+
+        source = inspect.getsource(
+            __import__(
+                "skill_seekers.cli.create_command",
+                fromlist=["CreateCommand"],
+            ).CreateCommand._route_to_scraper
+        )
+        for source_type, (module, flag) in self.GENERIC_ROUTES.items():
+            assert f'"{module}"' in source, f"Module name '{module}' not found for '{source_type}'"
+            assert f'"{flag}"' in source, f"Flag '{flag}' not found for '{source_type}'"
+
+
+if __name__ == "__main__":
+    pytest.main([__file__, "-v"])
--- a/uv.lock
+++ b/uv.lock
@@ -220,6 +220,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/7f/9c/36c5c37947ebfb8c7f22e0eb6e4d188ee2d53aa3880f3f2744fb894f0cb1/anyio-4.12.0-py3-none-any.whl", hash = "sha256:dad2376a628f98eeca4881fc56cd06affd18f659b17a747d3ff0307ced94b1bb", size = 113362, upload-time = "2025-11-28T23:36:57.897Z" },
 ]

+[[package]]
+name = "asciidoc"
+version = "10.2.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/1d/e7/315a82f2d256e9270977aa3c15e8fe281fd7c40b8e2a0b97e0cb61ca8fa0/asciidoc-10.2.1.tar.gz", hash = "sha256:d9f13c285981b3c7eb660d02ca0a2779981e88d48105de81bb40445e60dddb83", size = 230179, upload-time = "2024-07-17T03:12:52.681Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/75/1f/87941eaa96e86aa22086064f67e4187e2710fb76c147312979ea29278dac/asciidoc-10.2.1-py2.py3-none-any.whl", hash = "sha256:3f277a636b617c9ce7e0b87bcaea51f144500e9a5c8a6488421ee24594850d40", size = 272433, upload-time = "2024-07-17T03:12:49.012Z" },
+]
+
 [[package]]
 name = "async-timeout"
 version = "5.0.1"
@@ -229,6 +238,24 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/fe/ba/e2081de779ca30d473f21f5b30e0e737c438205440784c7dfc81efc2b029/async_timeout-5.0.1-py3-none-any.whl", hash = "sha256:39e3809566ff85354557ec2398b55e096c8364bacac9405a7a1fa429e77fe76c", size = 6233, upload-time = "2024-11-06T16:41:37.9Z" },
 ]

+[[package]]
+name = "atlassian-python-api"
+version = "4.0.7"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "beautifulsoup4" },
+    { name = "deprecated" },
+    { name = "jmespath" },
+    { name = "oauthlib" },
+    { name = "requests" },
+    { name = "requests-oauthlib" },
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/40/e8/f23b7273e410c6fe9f98f9db25268c6736572f22a9566d1dc9ed3614bb68/atlassian_python_api-4.0.7.tar.gz", hash = "sha256:8d9cc6068b1d2a48eb434e22e57f6bbd918a47fac9e46b95b7a3cefb00fceacb", size = 271149, upload-time = "2025-08-21T13:19:40.746Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/1d/83/e4f9976ce3c933a079b8931325e7a9c0a8bba7030a2cb85764c0048f3479/atlassian_python_api-4.0.7-py3-none-any.whl", hash = "sha256:46a70cb29eaab87c0a1697fccd3e25df1aa477e6aa4fb9ba936a9d46b425933c", size = 197746, upload-time = "2025-08-21T13:19:39.044Z" },
+]
+
 [[package]]
 name = "attrs"
 version = "25.4.0"
@@ -1135,6 +1162,27 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/05/99/49ee85903dee060d9f08297b4a342e5e0bcfca2f027a07b4ee0a38ab13f9/faster_whisper-1.2.1-py3-none-any.whl", hash = "sha256:79a66ad50688c0b794dd501dc340a736992a6342f7f95e5811be60b5224a26a7", size = 1118909, upload-time = "2025-10-31T11:35:47.794Z" },
 ]

+[[package]]
+name = "fastjsonschema"
+version = "2.21.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/20/b5/23b216d9d985a956623b6bd12d4086b60f0059b27799f23016af04a74ea1/fastjsonschema-2.21.2.tar.gz", hash = "sha256:b1eb43748041c880796cd077f1a07c3d94e93ae84bba5ed36800a33554ae05de", size = 374130, upload-time = "2025-08-14T18:49:36.666Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/cb/a8/20d0723294217e47de6d9e2e40fd4a9d2f7c4b6ef974babd482a59743694/fastjsonschema-2.21.2-py3-none-any.whl", hash = "sha256:1c797122d0a86c5cace2e54bf4e819c36223b552017172f32c5c024a6b77e463", size = 24024, upload-time = "2025-08-14T18:49:34.776Z" },
+]
+
+[[package]]
+name = "feedparser"
+version = "6.0.12"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "sgmllib3k" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/dc/79/db7edb5e77d6dfbc54d7d9df72828be4318275b2e580549ff45a962f6461/feedparser-6.0.12.tar.gz", hash = "sha256:64f76ce90ae3e8ef5d1ede0f8d3b50ce26bcce71dd8ae5e82b1cd2d4a5f94228", size = 286579, upload-time = "2025-09-10T13:33:59.486Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/4e/eb/c96d64137e29ae17d83ad2552470bafe3a7a915e85434d9942077d7fd011/feedparser-6.0.12-py3-none-any.whl", hash = "sha256:6bbff10f5a52662c00a2e3f86a38928c37c48f77b3c511aedcd51de933549324", size = 81480, upload-time = "2025-09-10T13:33:58.022Z" },
+]
+
 [[package]]
 name = "ffmpeg-python"
 version = "0.2.0"
@@ -2100,6 +2148,19 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/41/45/1a4ed80516f02155c51f51e8cedb3c1902296743db0bbc66608a0db2814f/jsonschema_specifications-2025.9.1-py3-none-any.whl", hash = "sha256:98802fee3a11ee76ecaca44429fda8a41bff98b00a0f2838151b113f210cc6fe", size = 18437, upload-time = "2025-09-08T01:34:57.871Z" },
 ]

+[[package]]
+name = "jupyter-core"
+version = "5.9.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "platformdirs" },
+    { name = "traitlets" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/02/49/9d1284d0dc65e2c757b74c6687b6d319b02f822ad039e5c512df9194d9dd/jupyter_core-5.9.1.tar.gz", hash = "sha256:4d09aaff303b9566c3ce657f580bd089ff5c91f5f89cf7d8846c3cdf465b5508", size = 89814, upload-time = "2025-10-16T19:19:18.444Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/e7/e7/80988e32bf6f73919a113473a604f5a8f09094de312b9d52b79c2df7612b/jupyter_core-5.9.1-py3-none-any.whl", hash = "sha256:ebf87fdc6073d142e114c72c9e29a9d7ca03fad818c5d300ce2adc1fb0743407", size = 29032, upload-time = "2025-10-16T19:19:16.783Z" },
+]
+
 [[package]]
 name = "kubernetes"
 version = "35.0.0"
@@ -3122,6 +3183,21 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/79/7b/2c79738432f5c924bef5071f933bcc9efd0473bac3b4aa584a6f7c1c8df8/mypy_extensions-1.1.0-py3-none-any.whl", hash = "sha256:1be4cccdb0f2482337c4743e60421de3a356cd97508abadd57d47403e94f5505", size = 4963, upload-time = "2025-04-22T14:54:22.983Z" },
 ]

+[[package]]
+name = "nbformat"
+version = "5.10.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "fastjsonschema" },
+    { name = "jsonschema" },
+    { name = "jupyter-core" },
+    { name = "traitlets" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/6d/fd/91545e604bc3dad7dca9ed03284086039b294c6b3d75c0d2fa45f9e9caf3/nbformat-5.10.4.tar.gz", hash = "sha256:322168b14f937a5d11362988ecac2a4952d3d8e3a2cbeb2319584631226d5b3a", size = 142749, upload-time = "2024-04-04T11:20:37.371Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a9/82/0340caa499416c78e5d8f5f05947ae4bc3cba53c9f038ab6e9ed964e22f1/nbformat-5.10.4-py3-none-any.whl", hash = "sha256:3b48d6c8fbca4b299bf3982ea7db1af21580e4fec269ad087b9e81588891200b", size = 78454, upload-time = "2024-04-04T11:20:34.895Z" },
+]
+
 [[package]]
 name = "nest-asyncio"
 version = "1.6.0"
@@ -3173,6 +3249,18 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/60/90/81ac364ef94209c100e12579629dc92bf7a709a84af32f8c551b02c07e94/nltk-3.9.2-py3-none-any.whl", hash = "sha256:1e209d2b3009110635ed9709a67a1a3e33a10f799490fa71cf4bec218c11c88a", size = 1513404, upload-time = "2025-10-01T07:19:21.648Z" },
 ]

+[[package]]
+name = "notion-client"
+version = "3.0.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "httpx" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/a5/39/60afcbc0148c3dafaaefe851ae3f058077db49d66288dfb218a11a57b997/notion_client-3.0.0.tar.gz", hash = "sha256:05c4d2b4fa3491dc0de21c9c826277202ea8b8714077ee7f51a6e1a09ab23d0f", size = 31357, upload-time = "2026-02-16T11:15:48.024Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/aa/ce/6b03f9aedd2edfcc28e23ced5c2582d543f6ddbb2be5c570533f02890b27/notion_client-3.0.0-py2.py3-none-any.whl", hash = "sha256:177fc3d2ace7e8ef69cf96f46269e8a66071c2c7c526194bf06ce7925853e759", size = 18746, upload-time = "2026-02-16T11:15:46.602Z" },
+]
+
 [[package]]
 name = "numpy"
 version = "2.2.6"
@@ -4789,6 +4877,21 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/aa/76/03af049af4dcee5d27442f71b6924f01f3efb5d2bd34f23fcd563f2cc5f5/python_multipart-0.0.21-py3-none-any.whl", hash = "sha256:cf7a6713e01c87aa35387f4774e812c4361150938d20d232800f75ffcf266090", size = 24541, upload-time = "2025-12-17T09:24:21.153Z" },
 ]

+[[package]]
+name = "python-pptx"
+version = "1.0.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "lxml" },
+    { name = "pillow" },
+    { name = "typing-extensions" },
+    { name = "xlsxwriter" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/52/a9/0c0db8d37b2b8a645666f7fd8accea4c6224e013c42b1d5c17c93590cd06/python_pptx-1.0.2.tar.gz", hash = "sha256:479a8af0eaf0f0d76b6f00b0887732874ad2e3188230315290cd1f9dd9cc7095", size = 10109297, upload-time = "2024-08-07T17:33:37.772Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/d9/4f/00be2196329ebbff56ce564aa94efb0fbc828d00de250b1980de1a34ab49/python_pptx-1.0.2-py3-none-any.whl", hash = "sha256:160838e0b8565a8b1f67947675886e9fea18aa5e795db7ae531606d68e785cba", size = 472788, upload-time = "2024-08-07T17:33:28.192Z" },
+]
+
 [[package]]
 name = "pytz"
 version = "2025.2"
@@ -5570,6 +5673,12 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/e1/e3/c164c88b2e5ce7b24d667b9bd83589cf4f3520d97cad01534cd3c4f55fdb/setuptools-81.0.0-py3-none-any.whl", hash = "sha256:fdd925d5c5d9f62e4b74b30d6dd7828ce236fd6ed998a08d81de62ce5a6310d6", size = 1062021, upload-time = "2026-02-06T21:10:37.175Z" },
 ]

+[[package]]
+name = "sgmllib3k"
+version = "1.0.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/9e/bd/3704a8c3e0942d711c1299ebf7b9091930adae6675d7c8f476a7ce48653c/sgmllib3k-1.0.0.tar.gz", hash = "sha256:7868fb1c8bfa764c1ac563d3cf369c381d1325d36124933a726f29fcdaa812e9", size = 5750, upload-time = "2010-08-24T14:33:52.445Z" }
+
 [[package]]
 name = "shellingham"
 version = "1.5.4"
@@ -5619,23 +5728,30 @@ dependencies = [

 [package.optional-dependencies]
 all = [
+    { name = "asciidoc" },
+    { name = "atlassian-python-api" },
    { name = "azure-storage-blob" },
    { name = "boto3" },
    { name = "chromadb" },
    { name = "ebooklib" },
    { name = "fastapi" },
+    { name = "feedparser" },
    { name = "google-cloud-storage" },
    { name = "google-generativeai" },
    { name = "httpx" },
    { name = "httpx-sse" },
    { name = "mammoth" },
    { name = "mcp" },
+    { name = "nbformat" },
+    { name = "notion-client" },
    { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
    { name = "numpy", version = "2.4.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
    { name = "openai" },
    { name = "pinecone" },
    { name = "python-docx" },
+    { name = "python-pptx" },
    { name = "sentence-transformers" },
+    { name = "slack-sdk" },
    { name = "sse-starlette" },
    { name = "starlette" },
    { name = "uvicorn" },
@@ -5653,12 +5769,21 @@ all-llms = [
    { name = "google-generativeai" },
    { name = "openai" },
 ]
+asciidoc = [
+    { name = "asciidoc" },
+]
 azure = [
    { name = "azure-storage-blob" },
 ]
+chat = [
+    { name = "slack-sdk" },
+]
 chroma = [
    { name = "chromadb" },
 ]
+confluence = [
+    { name = "atlassian-python-api" },
+]
 docx = [
    { name = "mammoth" },
    { name = "python-docx" },
@@ -5680,6 +5805,9 @@ gcs = [
 gemini = [
    { name = "google-generativeai" },
 ]
+jupyter = [
+    { name = "nbformat" },
+]
 mcp = [
    { name = "httpx" },
    { name = "httpx-sse" },
@@ -5688,18 +5816,27 @@ mcp = [
    { name = "starlette" },
    { name = "uvicorn" },
 ]
+notion = [
+    { name = "notion-client" },
+]
 openai = [
    { name = "openai" },
 ]
 pinecone = [
    { name = "pinecone" },
 ]
+pptx = [
+    { name = "python-pptx" },
+]
 rag-upload = [
    { name = "chromadb" },
    { name = "pinecone" },
    { name = "sentence-transformers" },
    { name = "weaviate-client" },
 ]
+rss = [
+    { name = "feedparser" },
+]
 s3 = [
    { name = "boto3" },
 ]
@@ -5743,6 +5880,10 @@ dev = [
 [package.metadata]
 requires-dist = [
    { name = "anthropic", specifier = ">=0.76.0" },
+    { name = "asciidoc", marker = "extra == 'all'", specifier = ">=10.0.0" },
+    { name = "asciidoc", marker = "extra == 'asciidoc'", specifier = ">=10.0.0" },
+    { name = "atlassian-python-api", marker = "extra == 'all'", specifier = ">=3.41.0" },
+    { name = "atlassian-python-api", marker = "extra == 'confluence'", specifier = ">=3.41.0" },
    { name = "azure-storage-blob", marker = "extra == 'all'", specifier = ">=12.19.0" },
    { name = "azure-storage-blob", marker = "extra == 'all-cloud'", specifier = ">=12.19.0" },
    { name = "azure-storage-blob", marker = "extra == 'azure'", specifier = ">=12.19.0" },
@@ -5759,6 +5900,8 @@ requires-dist = [
    { name = "fastapi", marker = "extra == 'all'", specifier = ">=0.109.0" },
    { name = "fastapi", marker = "extra == 'embedding'", specifier = ">=0.109.0" },
    { name = "faster-whisper", marker = "extra == 'video-full'", specifier = ">=1.0.0" },
+    { name = "feedparser", marker = "extra == 'all'", specifier = ">=6.0.0" },
+    { name = "feedparser", marker = "extra == 'rss'", specifier = ">=6.0.0" },
    { name = "gitpython", specifier = ">=3.1.40" },
    { name = "google-cloud-storage", marker = "extra == 'all'", specifier = ">=2.10.0" },
    { name = "google-cloud-storage", marker = "extra == 'all-cloud'", specifier = ">=2.10.0" },
@@ -5778,7 +5921,11 @@ requires-dist = [
    { name = "mammoth", marker = "extra == 'docx'", specifier = ">=1.6.0" },
    { name = "mcp", marker = "extra == 'all'", specifier = ">=1.25,<2" },
    { name = "mcp", marker = "extra == 'mcp'", specifier = ">=1.25,<2" },
+    { name = "nbformat", marker = "extra == 'all'", specifier = ">=5.9.0" },
+    { name = "nbformat", marker = "extra == 'jupyter'", specifier = ">=5.9.0" },
    { name = "networkx", specifier = ">=3.0" },
+    { name = "notion-client", marker = "extra == 'all'", specifier = ">=2.0.0" },
+    { name = "notion-client", marker = "extra == 'notion'", specifier = ">=2.0.0" },
    { name = "numpy", marker = "extra == 'all'", specifier = ">=1.24.0" },
    { name = "numpy", marker = "extra == 'embedding'", specifier = ">=1.24.0" },
    { name = "openai", marker = "extra == 'all'", specifier = ">=1.0.0" },
@@ -5799,6 +5946,8 @@ requires-dist = [
    { name = "python-docx", marker = "extra == 'all'", specifier = ">=1.1.0" },
    { name = "python-docx", marker = "extra == 'docx'", specifier = ">=1.1.0" },
    { name = "python-dotenv", specifier = ">=1.1.1" },
+    { name = "python-pptx", marker = "extra == 'all'", specifier = ">=0.6.21" },
+    { name = "python-pptx", marker = "extra == 'pptx'", specifier = ">=0.6.21" },
    { name = "pyyaml", specifier = ">=6.0" },
    { name = "requests", specifier = ">=2.32.5" },
    { name = "scenedetect", extras = ["opencv"], marker = "extra == 'video-full'", specifier = ">=0.6.4" },
@@ -5807,6 +5956,8 @@ requires-dist = [
    { name = "sentence-transformers", marker = "extra == 'embedding'", specifier = ">=2.3.0" },
    { name = "sentence-transformers", marker = "extra == 'rag-upload'", specifier = ">=2.2.0" },
    { name = "sentence-transformers", marker = "extra == 'sentence-transformers'", specifier = ">=2.2.0" },
+    { name = "slack-sdk", marker = "extra == 'all'", specifier = ">=3.27.0" },
+    { name = "slack-sdk", marker = "extra == 'chat'", specifier = ">=3.27.0" },
    { name = "sse-starlette", marker = "extra == 'all'", specifier = ">=3.0.2" },
    { name = "sse-starlette", marker = "extra == 'mcp'", specifier = ">=3.0.2" },
    { name = "starlette", marker = "extra == 'all'", specifier = ">=0.48.0" },
@@ -5827,7 +5978,7 @@ requires-dist = [
    { name = "yt-dlp", marker = "extra == 'video'", specifier = ">=2024.12.0" },
    { name = "yt-dlp", marker = "extra == 'video-full'", specifier = ">=2024.12.0" },
 ]
-provides-extras = ["mcp", "gemini", "openai", "all-llms", "s3", "gcs", "azure", "docx", "epub", "video", "video-full", "chroma", "weaviate", "sentence-transformers", "pinecone", "rag-upload", "all-cloud", "embedding", "all"]
+provides-extras = ["mcp", "gemini", "openai", "all-llms", "s3", "gcs", "azure", "docx", "epub", "video", "video-full", "chroma", "weaviate", "sentence-transformers", "pinecone", "rag-upload", "all-cloud", "jupyter", "asciidoc", "pptx", "confluence", "notion", "rss", "chat", "embedding", "all"]

 [package.metadata.requires-dev]
 dev = [
@@ -5846,6 +5997,15 @@ dev = [
    { name = "starlette", specifier = ">=0.31.0" },
 ]

+[[package]]
+name = "slack-sdk"
+version = "3.41.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/22/35/fc009118a13187dd9731657c60138e5a7c2dea88681a7f04dc406af5da7d/slack_sdk-3.41.0.tar.gz", hash = "sha256:eb61eb12a65bebeca9cb5d36b3f799e836ed2be21b456d15df2627cfe34076ca", size = 250568, upload-time = "2026-03-12T16:10:11.381Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a1/df/2e4be347ff98281b505cc0ccf141408cdd25eb5ca9f3830deb361b2472d3/slack_sdk-3.41.0-py2.py3-none-any.whl", hash = "sha256:bb18dcdfff1413ec448e759cf807ec3324090993d8ab9111c74081623b692a89", size = 313885, upload-time = "2026-03-12T16:10:09.811Z" },
+]
+
 [[package]]
 name = "smmap"
 version = "5.0.2"
@@ -6233,6 +6393,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/d0/30/dc54f88dd4a2b5dc8a0279bdd7270e735851848b762aeb1c1184ed1f6b14/tqdm-4.67.1-py3-none-any.whl", hash = "sha256:26445eca388f82e72884e0d580d5464cd801a3ea01e63e5601bdff9ba6a48de2", size = 78540, upload-time = "2024-11-24T20:12:19.698Z" },
 ]

+[[package]]
+name = "traitlets"
+version = "5.14.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/eb/79/72064e6a701c2183016abbbfedaba506d81e30e232a68c9f0d6f6fcd1574/traitlets-5.14.3.tar.gz", hash = "sha256:9ed0579d3502c94b4b3732ac120375cda96f923114522847de4b3bb98b96b6b7", size = 161621, upload-time = "2024-04-19T11:11:49.746Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/00/c0/8f5d070730d7836adc9c9b6408dec68c6ced86b304a9b26a14df072a6e8c/traitlets-5.14.3-py3-none-any.whl", hash = "sha256:b74e89e397b1ed28cc831db7aea759ba6640cb3de13090ca145426688ff1ac4f", size = 85359, upload-time = "2024-04-19T11:11:46.763Z" },
+]
+
 [[package]]
 name = "transformers"
 version = "5.1.0"
@@ -6753,6 +6922,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/1f/f6/a933bd70f98e9cf3e08167fc5cd7aaaca49147e48411c0bd5ae701bb2194/wrapt-1.17.3-py3-none-any.whl", hash = "sha256:7171ae35d2c33d326ac19dd8facb1e82e5fd04ef8c6c0e394d7af55a55051c22", size = 23591, upload-time = "2025-08-12T05:53:20.674Z" },
 ]

+[[package]]
+name = "xlsxwriter"
+version = "3.2.9"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/46/2c/c06ef49dc36e7954e55b802a8b231770d286a9758b3d936bd1e04ce5ba88/xlsxwriter-3.2.9.tar.gz", hash = "sha256:254b1c37a368c444eac6e2f867405cc9e461b0ed97a3233b2ac1e574efb4140c", size = 215940, upload-time = "2025-09-16T00:16:21.63Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/3a/0c/3662f4a66880196a590b202f0db82d919dd2f89e99a27fadef91c4a33d41/xlsxwriter-3.2.9-py3-none-any.whl", hash = "sha256:9a5db42bc5dff014806c58a20b9eae7322a134abb6fce3c92c181bfb275ec5b3", size = 175315, upload-time = "2025-09-16T00:16:20.108Z" },
+]
+
 [[package]]
 name = "xxhash"
 version = "3.6.0"