firefrost-gaming/skill-seekers-reference

Files

yusyus 37cb307455 docs: update all documentation for 17 source types

Update 32 documentation files across English and Chinese (zh-CN) docs
to reflect the 10 new source types added in the previous commit.

Updated files:
- README.md, README.zh-CN.md — taglines, feature lists, examples, install extras
- docs/reference/ — CLI_REFERENCE, FEATURE_MATRIX, MCP_REFERENCE, CONFIG_FORMAT, API_REFERENCE
- docs/features/ — UNIFIED_SCRAPING with generic merge docs
- docs/advanced/ — multi-source guide, MCP server guide
- docs/getting-started/ — installation extras, quick-start examples
- docs/user-guide/ — core-concepts, scraping, packaging, workflows (complex-merge)
- docs/ — FAQ, TROUBLESHOOTING, BEST_PRACTICES, ARCHITECTURE, UNIFIED_PARSERS, README
- Root — BULLETPROOF_QUICKSTART, CONTRIBUTING, ROADMAP
- docs/zh-CN/ — Chinese translations for all of the above

32 files changed, +3,016 lines, -245 lines

2026-03-15 15:56:04 +03:00

22 KiB

Raw Blame History

Frequently Asked Questions (FAQ)

Version: 3.2.0 Last Updated: 2026-03-15

General Questions

What is Skill Seekers?

Skill Seekers is a Python tool that converts 17 source types — documentation websites, GitHub repos, PDFs, videos, Word docs, EPUB books, Jupyter notebooks, local HTML files, OpenAPI specs, AsciiDoc, PowerPoint, RSS/Atom feeds, man pages, Confluence wikis, Notion pages, Slack/Discord exports, and local codebases — into AI-ready formats for 16+ platforms: LLM platforms (Claude, Gemini, OpenAI), RAG frameworks (LangChain, LlamaIndex, Haystack), vector databases (ChromaDB, FAISS, Weaviate, Qdrant, Pinecone), and AI coding assistants (Cursor, Windsurf, Cline, Continue.dev).

Use Cases:

Create custom documentation skills for your favorite frameworks
Analyze GitHub repositories and extract code patterns
Convert PDF manuals into searchable AI skills
Import knowledge from Confluence, Notion, or Slack/Discord
Extract content from videos (YouTube, Vimeo, local files)
Convert Jupyter notebooks, EPUB books, or PowerPoint slides into skills
Parse OpenAPI/Swagger specs into API reference skills
Combine multiple sources (docs + code + PDFs + more) into unified skills

Which platforms are supported?

Supported Platforms (16+):

LLM Platforms:

Claude AI - ZIP format with YAML frontmatter
Google Gemini - tar.gz format for Grounded Generation
OpenAI ChatGPT - ZIP format for Vector Stores
Generic Markdown - ZIP format with markdown files

RAG Frameworks: 5. LangChain - Document objects for QA chains and agents 6. LlamaIndex - TextNodes for query engines 7. Haystack - Document objects for enterprise RAG

Vector Databases: 8. ChromaDB - Direct collection upload 9. FAISS - Index files for local similarity search 10. Weaviate - Vector objects with schema creation 11. Qdrant - Points with payload indexing 12. Pinecone - Ready-to-upsert format

AI Coding Assistants: 13. Cursor - .cursorrules persistent context 14. Windsurf - .windsurfrules AI coding rules 15. Cline - .clinerules + MCP integration 16. Continue.dev - HTTP context server (all IDEs)

Each platform has a dedicated adaptor for optimal formatting and upload.

Is it free to use?

Tool: Yes, Skill Seekers is 100% free and open-source (MIT license).

API Costs:

Scraping: Free (just bandwidth)
AI Enhancement (API mode): ~$0.15-0.30 per skill (Claude API)
AI Enhancement (LOCAL mode): Free! (uses your Claude Code Max plan)
Upload: Free (platform storage limits apply)

Recommendation: Use LOCAL mode for free AI enhancement or skip enhancement entirely.

How do I set up video extraction?

Quick setup:

# 1. Install video support
pip install skill-seekers[video-full]

# 2. Auto-detect GPU and install visual deps
skill-seekers video --setup

The --setup command auto-detects your GPU vendor (NVIDIA CUDA, AMD ROCm, or CPU-only) and installs the correct PyTorch variant along with easyocr and other visual extraction dependencies. This avoids the ~2GB NVIDIA CUDA download that would happen if easyocr were installed via pip on non-NVIDIA systems.

What it detects:

NVIDIA: Uses nvidia-smi to find CUDA version → installs matching cu124/cu121/cu118 PyTorch
AMD: Uses rocminfo to find ROCm version → installs matching ROCm PyTorch
CPU-only: Installs lightweight CPU-only PyTorch

What source types are supported?

Skill Seekers supports 17 source types:

#	Source Type	CLI Command	Auto-Detection
1	Documentation (web)	`scrape` / `create <url>`	HTTP/HTTPS URLs
2	GitHub repo	`github` / `create owner/repo`	`owner/repo` or github.com URLs
3	PDF	`pdf` / `create file.pdf`	`.pdf` extension
4	Word (.docx)	`word` / `create file.docx`	`.docx` extension
5	EPUB	`epub` / `create file.epub`	`.epub` extension
6	Video	`video` / `create <url/file>`	YouTube/Vimeo URLs, video extensions
7	Local codebase	`analyze` / `create ./path`	Directory paths
8	Jupyter Notebook	`jupyter` / `create file.ipynb`	`.ipynb` extension
9	Local HTML	`html` / `create file.html`	`.html`/`.htm` extensions
10	OpenAPI/Swagger	`openapi` / `create spec.yaml`	`.yaml`/`.yml` with OpenAPI content
11	AsciiDoc	`asciidoc` / `create file.adoc`	`.adoc`/`.asciidoc` extensions
12	PowerPoint	`pptx` / `create file.pptx`	`.pptx` extension
13	RSS/Atom	`rss` / `create feed.rss`	`.rss`/`.atom` extensions
14	Man pages	`manpage` / `create cmd.1`	`.1`-`.8`/`.man` extensions
15	Confluence	`confluence`	API or export directory
16	Notion	`notion`	API or export directory
17	Slack/Discord	`chat`	Export directory or API

The create command auto-detects the source type from your input, so you often don't need to specify a subcommand.

How long does it take to create a skill?

Typical Times:

Documentation scraping: 5-45 minutes (depends on size)
GitHub analysis: 1-5 minutes (basic) or 20-60 minutes (C3.x deep analysis)
PDF extraction: 30 seconds - 5 minutes
Video extraction: 2-10 minutes (depends on length and visual analysis)
Word/EPUB/PPTX: 10-60 seconds
Jupyter notebook: 10-30 seconds
OpenAPI spec: 5-15 seconds
Confluence/Notion import: 1-5 minutes (depends on space size)
AI enhancement: 30-60 seconds (LOCAL or API mode)
Total workflow: 10-60 minutes

Speed Tips:

Use --async for 2-3x faster scraping
Use --skip-scrape to rebuild without re-scraping
Skip AI enhancement for faster workflow

Installation & Setup

How do I install Skill Seekers?

# Basic installation
pip install skill-seekers

# With all platform support
pip install skill-seekers[all-llms]

# Development installation
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers
pip install -e ".[all-llms,dev]"

What Python version do I need?

Required: Python 3.10 or higher Tested on: Python 3.10, 3.11, 3.12, 3.13 OS Support: Linux, macOS, Windows (WSL recommended)

Check your version:

python --version  # Should be 3.10+

Why do I get "No module named 'skill_seekers'" error?

Common Causes:

Package not installed
Wrong Python environment

Solutions:

# Install package
pip install skill-seekers

# Or for development
pip install -e .

# Verify installation
skill-seekers --version

How do I set up API keys?

# Claude AI (for enhancement and upload)
export ANTHROPIC_API_KEY=sk-ant-...

# Google Gemini (for upload)
export GOOGLE_API_KEY=AIza...

# OpenAI ChatGPT (for upload)
export OPENAI_API_KEY=sk-...

# GitHub (for higher rate limits)
export GITHUB_TOKEN=ghp_...

# Make permanent (add to ~/.bashrc or ~/.zshrc)
echo 'export ANTHROPIC_API_KEY=sk-ant-...' >> ~/.bashrc

Usage Questions

How do I scrape documentation?

Using preset config:

skill-seekers scrape --config react

Using custom URL:

skill-seekers scrape --base-url https://docs.example.com --name my-framework

From custom config file:

skill-seekers scrape --config configs/my-framework.json

Can I analyze GitHub repositories?

Yes! Skill Seekers has powerful GitHub analysis:

# Basic analysis (fast)
skill-seekers github https://github.com/facebook/react

# Deep C3.x analysis (includes patterns, tests, guides)
skill-seekers github https://github.com/vercel/next.js --analysis-depth c3x

C3.x Features:

Design pattern detection (10 GoF patterns)
Test example extraction
How-to guide generation
Configuration pattern extraction
Architectural overview
API reference generation

Can I extract content from PDFs?

Yes! PDF extraction with OCR support:

# Basic PDF extraction
skill-seekers pdf manual.pdf --name product-manual

# With OCR (for scanned PDFs)
skill-seekers pdf scanned.pdf --enable-ocr

# Extract images and tables
skill-seekers pdf document.pdf --extract-images --extract-tables

How do I scrape a Jupyter Notebook?

# Extract cells, outputs, and markdown from a notebook
skill-seekers jupyter analysis.ipynb --name data-analysis

# Or use auto-detection
skill-seekers create analysis.ipynb

Jupyter extraction preserves code cells, markdown cells, and cell outputs. It works with .ipynb files from JupyterLab, Google Colab, and other notebook environments.

How do I import from Confluence or Notion?

Confluence:

# From Confluence Cloud API
export CONFLUENCE_URL=https://yourorg.atlassian.net
export CONFLUENCE_TOKEN=your-api-token
export CONFLUENCE_EMAIL=your-email@example.com
skill-seekers confluence --space MYSPACE --name my-wiki

# From a Confluence HTML/XML export directory
skill-seekers confluence --export-dir ./confluence-export --name my-wiki

Notion:

# From Notion API
export NOTION_TOKEN=secret_...
skill-seekers notion --database DATABASE_ID --name my-notes

# From a Notion HTML/Markdown export directory
skill-seekers notion --export-dir ./notion-export --name my-notes

How do I convert Word, EPUB, or PowerPoint files?

# Word document
skill-seekers word report.docx --name quarterly-report

# EPUB book
skill-seekers epub handbook.epub --name dev-handbook

# PowerPoint presentation
skill-seekers pptx slides.pptx --name training-deck

# Or use auto-detection for any of them
skill-seekers create report.docx
skill-seekers create handbook.epub
skill-seekers create slides.pptx

How do I parse an OpenAPI/Swagger spec?

# From a local YAML/JSON file
skill-seekers openapi api-spec.yaml --name my-api

# Auto-detection works too
skill-seekers create api-spec.yaml

OpenAPI extraction parses endpoints, schemas, parameters, and examples into a structured API reference skill.

How do I extract content from RSS feeds or man pages?

# RSS/Atom feed
skill-seekers rss https://blog.example.com/feed.xml --name blog-feed

# Man page
skill-seekers manpage grep.1 --name grep-manual

How do I import from Slack or Discord?

# From a Slack export directory
skill-seekers chat --platform slack --export-dir ./slack-export --name team-knowledge

# From a Discord export directory
skill-seekers chat --platform discord --export-dir ./discord-export --name server-archive

Can I combine multiple sources?

Yes! Unified multi-source scraping:

Create unified config (configs/unified/my-framework.json):

{
  "name": "my-framework",
  "sources": {
    "documentation": {
      "type": "docs",
      "base_url": "https://docs.example.com"
    },
    "github": {
      "type": "github",
      "repo_url": "https://github.com/org/repo"
    },
    "pdf": {
      "type": "pdf",
      "pdf_path": "manual.pdf"
    }
  }
}

Run unified scraping:

skill-seekers unified --config configs/unified/my-framework.json

How do I upload skills to platforms?

# Upload to Claude AI
export ANTHROPIC_API_KEY=sk-ant-...
skill-seekers upload output/react-claude.zip --target claude

# Upload to Google Gemini
export GOOGLE_API_KEY=AIza...
skill-seekers upload output/react-gemini.tar.gz --target gemini

# Upload to OpenAI ChatGPT
export OPENAI_API_KEY=sk-...
skill-seekers upload output/react-openai.zip --target openai

Or use complete workflow:

skill-seekers install react --target claude --upload

Platform-Specific Questions

What's the difference between platforms?

Feature	Claude AI	Google Gemini	OpenAI ChatGPT	Markdown
Format	ZIP + YAML	tar.gz	ZIP	ZIP
Upload API	Projects API	Corpora API	Vector Stores	N/A
Model	Sonnet 4.5	Gemini 2.0 Flash	GPT-4o	N/A
Max Size	32MB	10MB	512MB	N/A
Use Case	Claude Code	Grounded Gen	ChatGPT Custom	Export

Choose based on:

Claude AI: Best for Claude Code integration
Google Gemini: Best for Grounded Generation in Gemini
OpenAI ChatGPT: Best for ChatGPT Custom GPTs
Markdown: Generic export for other tools

Can I use multiple platforms at once?

Yes! Package and upload to all platforms:

# Package for all platforms
for platform in claude gemini openai markdown; do
  skill-seekers package output/react/ --target $platform
done

# Upload to all platforms
skill-seekers install react --target claude,gemini,openai --upload

How do I use skills in Claude Code?

Install skill to Claude Code directory:

skill-seekers install-agent --skill-dir output/react/ --agent-dir ~/.claude/skills/react

Use in Claude Code:

Use the react skill to explain React hooks

Or upload to Claude AI:

skill-seekers upload output/react-claude.zip --target claude

Features & Capabilities

What is AI enhancement?

AI enhancement transforms basic skills (2-3/10 quality) into production-ready skills (8-9/10 quality) using LLMs.

Two Modes:

API Mode: Direct Claude API calls (fast, costs ~$0.15-0.30)
LOCAL Mode: Uses Claude Code CLI (free with your Max plan)

What it improves:

Better organization and structure
Clearer explanations
More examples and use cases
Better cross-references
Improved searchability

Usage:

# API mode (if ANTHROPIC_API_KEY is set)
skill-seekers enhance output/react/

# LOCAL mode (free!)
skill-seekers enhance output/react/ --mode LOCAL

# Background mode
skill-seekers enhance output/react/ --background
skill-seekers enhance-status output/react/ --watch

What are C3.x features?

C3.x features are advanced codebase analysis capabilities:

C3.1: Design pattern detection (Singleton, Factory, Strategy, etc.)
C3.2: Test example extraction (real usage examples from tests)
C3.3: How-to guide generation (educational guides from test workflows)
C3.4: Configuration pattern extraction (env vars, config files)
C3.5: Architectural overview (system architecture analysis)
C3.6: AI enhancement (Claude API integration for insights)
C3.7: Architectural pattern detection (MVC, MVVM, Repository, etc.)
C3.8: Standalone codebase scraping (300+ line SKILL.md from code alone)

Enable C3.x:

# All C3.x features enabled by default
skill-seekers codebase --directory /path/to/repo

# Skip specific features
skill-seekers codebase --directory . --skip-patterns --skip-how-to-guides

What are router skills?

Router skills help Claude navigate large documentation (>500 pages) by providing a table of contents and keyword index.

When to use:

Documentation with 500+ pages
Complex multi-section docs
Large API references

Generate router:

skill-seekers generate-router output/large-docs/

What preset configurations are available?

24 preset configs:

Web: react, vue, angular, svelte, nextjs
Python: django, flask, fastapi, sqlalchemy, pytest
Game Dev: godot, pygame, unity
DevOps: docker, kubernetes, terraform, ansible
Unified: react-unified, vue-unified, nextjs-unified, etc.

List all:

skill-seekers list-configs

Troubleshooting

Scraping is very slow, how can I speed it up?

Solutions:

Use async mode (2-3x faster):

skill-seekers scrape --config react --async

Increase rate limit (faster requests):

{
  "rate_limit": 0.1  // Faster (but may hit rate limits)
}

Limit pages:

{
  "max_pages": 100  // Stop after 100 pages
}

Why are some pages missing?

Common Causes:

URL patterns exclude them
Max pages limit reached
BFS didn't reach them

Solutions:

# Check URL patterns in config
{
  "url_patterns": {
    "include": ["/docs/"],  // Make sure your pages match
    "exclude": []           // Remove overly broad exclusions
  }
}

# Increase max pages
{
  "max_pages": 1000  // Default is 500
}

# Use verbose mode to see what's being scraped
skill-seekers scrape --config react --verbose

How do I fix "NetworkError: Connection failed"?

Solutions:

Check internet connection
Verify URL is accessible:

curl -I https://docs.example.com

Increase timeout:

{
  "timeout": 30  // 30 seconds
}

Check rate limiting:

{
  "rate_limit": 1.0  // Slower requests
}

Tests are failing, what should I do?

Quick fixes:

# Ensure package is installed
pip install -e ".[all-llms,dev]"

# Clear caches
rm -rf .pytest_cache/ **/__pycache__/

# Run specific failing test
pytest tests/test_file.py::test_name -vv

# Check for missing dependencies
pip install -e ".[all-llms,dev]"

If still failing:

Check Troubleshooting Guide
Report issue on GitHub

MCP Server Questions

How do I start the MCP server?

# stdio mode (Claude Code, VS Code + Cline)
skill-seekers-mcp

# HTTP mode (Cursor, Windsurf, IntelliJ)
skill-seekers-mcp --transport http --port 8765

What MCP tools are available?

26 MCP tools:

Core Tools (9):

list_configs - List preset configurations
generate_config - Generate config from docs URL
validate_config - Validate config structure
estimate_pages - Estimate page count
scrape_docs - Scrape documentation
package_skill - Package to .zip (supports --format and --target)
upload_skill - Upload to platform (supports --target)
enhance_skill - AI enhancement
install_skill - Complete workflow

Extended Tools (10): 10. scrape_github - GitHub analysis 11. scrape_pdf - PDF extraction 12. unified_scrape - Multi-source scraping 13. merge_sources - Merge docs + code 14. detect_conflicts - Find discrepancies 15. split_config - Split large configs 16. generate_router - Generate router skills 17. add_config_source - Register git repos 18. fetch_config - Fetch configs from git 19. list_config_sources - List registered sources 20. remove_config_source - Remove config source

Vector DB Tools (4): 21. export_to_chroma - Export to ChromaDB 22. export_to_weaviate - Export to Weaviate 23. export_to_faiss - Export to FAISS 24. export_to_qdrant - Export to Qdrant

Cloud Tools (3): 25. cloud_upload - Upload to S3/GCS/Azure 26. cloud_download - Download from cloud storage

How do I configure MCP for Claude Code?

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "skill-seekers": {
      "command": "skill-seekers-mcp"
    }
  }
}

Restart Claude Code, then use:

Use skill-seekers MCP tools to scrape React documentation

Advanced Questions

Can I use Skill Seekers programmatically?

Yes! Full API for Python integration:

from skill_seekers.cli.doc_scraper import scrape_all, build_skill
from skill_seekers.cli.adaptors import get_adaptor

# Scrape documentation
pages = scrape_all(
    base_url='https://docs.example.com',
    selectors={'main_content': 'article'},
    config={'name': 'example'}
)

# Build skill
skill_path = build_skill(
    config_name='example',
    output_dir='output/example'
)

# Package for platform
adaptor = get_adaptor('claude')
package_path = adaptor.package(skill_path, 'output/')

See: API Reference

How do I create custom configurations?

Create config file (configs/my-framework.json):

{
  "name": "my-framework",
  "description": "My custom framework documentation",
  "base_url": "https://docs.example.com/",
  "selectors": {
    "main_content": "article",  // CSS selector
    "title": "h1",
    "code_blocks": "pre code"
  },
  "url_patterns": {
    "include": ["/docs/", "/api/"],
    "exclude": ["/blog/", "/changelog/"]
  },
  "categories": {
    "getting_started": ["intro", "quickstart"],
    "api": ["api", "reference"]
  },
  "rate_limit": 0.5,
  "max_pages": 500
}

Use config:

skill-seekers scrape --config configs/my-framework.json

Can I contribute preset configs?

Yes! We welcome config contributions:

Create config in configs/ directory
Test it thoroughly:

skill-seekers scrape --config configs/your-framework.json

Submit PR on GitHub

Guidelines:

Name: {framework-name}.json
Include all required fields
Add to appropriate category
Test with real documentation

How do I debug scraping issues?

# Verbose output
skill-seekers scrape --config react --verbose

# Dry run (no actual scraping)
skill-seekers scrape --config react --dry-run

# Single page test
skill-seekers scrape --base-url https://docs.example.com/intro --max-pages 1

# Check selectors
skill-seekers validate-config configs/react.json

Getting More Help

Where can I find documentation?

Main Documentation:

README - Project overview
Usage Guide - Detailed usage
API Reference - Programmatic usage
Troubleshooting - Common issues

Guides:

How do I report bugs?

Check existing issues: https://github.com/yusufkaraaslan/Skill_Seekers/issues
Create new issue with:
- Skill Seekers version (skill-seekers --version)
- Python version (python --version)
- Operating system
- Config file (if relevant)
- Error message and stack trace
- Steps to reproduce

How do I request features?

Check roadmap: ROADMAP.md
Create feature request: https://github.com/yusufkaraaslan/Skill_Seekers/issues
Join discussions: https://github.com/yusufkaraaslan/Skill_Seekers/discussions

Is there a community?

Yes!

GitHub Discussions: https://github.com/yusufkaraaslan/Skill_Seekers/discussions
Issue Tracker: https://github.com/yusufkaraaslan/Skill_Seekers/issues
Project Board: https://github.com/users/yusufkaraaslan/projects/2

Version: 3.2.0 Last Updated: 2026-03-15 Questions? Ask on GitHub Discussions

22 KiB Raw Blame History

Frequently Asked Questions (FAQ)

General Questions

What is Skill Seekers?

Which platforms are supported?

Is it free to use?

How do I set up video extraction?

What source types are supported?

How long does it take to create a skill?

Installation & Setup

How do I install Skill Seekers?

What Python version do I need?

Why do I get "No module named 'skill_seekers'" error?

How do I set up API keys?

Usage Questions

How do I scrape documentation?

Can I analyze GitHub repositories?

Can I extract content from PDFs?

How do I scrape a Jupyter Notebook?

How do I import from Confluence or Notion?

How do I convert Word, EPUB, or PowerPoint files?

How do I parse an OpenAPI/Swagger spec?

How do I extract content from RSS feeds or man pages?

How do I import from Slack or Discord?

Can I combine multiple sources?

How do I upload skills to platforms?

Platform-Specific Questions

What's the difference between platforms?

Can I use multiple platforms at once?

How do I use skills in Claude Code?

Features & Capabilities

What is AI enhancement?

What are C3.x features?

What are router skills?

What preset configurations are available?

Troubleshooting

Scraping is very slow, how can I speed it up?

Why are some pages missing?

How do I fix "NetworkError: Connection failed"?

Tests are failing, what should I do?

MCP Server Questions

How do I start the MCP server?

What MCP tools are available?

How do I configure MCP for Claude Code?

Advanced Questions

Can I use Skill Seekers programmatically?

How do I create custom configurations?

Can I contribute preset configs?

How do I debug scraping issues?

Getting More Help

Where can I find documentation?

How do I report bugs?

How do I request features?

Is there a community?

22 KiB

Raw Blame History