Update documentation for large documentation features
Comprehensive documentation updates for large docs support: README.md: - Add "Large Documentation Support" to key features - Add "Router/Hub Skills" feature highlight - Add "Checkpoint/Resume" feature highlight - Update MCP tools count: 6 → 8 - Add complete section 7: Large Documentation Support (10K-40K+ Pages) - Split strategies: auto, category, router, size - Parallel scraping workflow - Configuration examples - Benefits and use cases - Add section 8: Checkpoint/Resume for Long Scrapes - Configuration examples - Resume/fresh workflow - Benefits and features - Update documentation links to include LARGE_DOCUMENTATION.md - Update MCP guide links to reflect 8 tools docs/CLAUDE.md: - Add resume/checkpoint commands - Add large documentation commands (split, router, package_multi) - Update MCP integration section (8 tools) - Expand directory structure to show new files - Add split_strategy, split_config, checkpoint config parameters - Add "Large Documentation Support" and "Checkpoint/Resume" features - Add complete large documentation workflow (40K pages example) - Update all command paths to use cli/ prefix mcp/README.md: - Update tool count: 6 → 8 - Add tool 7: split_config with full documentation - Add tool 8: generate_router with full documentation - Add "Large Documentation (40K Pages)" workflow example - Update test coverage: 25 → 31 tests - Update performance table with parallel scraping metrics - Document all split strategies docs/MCP_SETUP.md: - Update verified tools count: 6 → 8 - Update test count: 25 → 31 All documentation now comprehensively covers: - Large documentation handling (10K-40K+ pages) - Router/hub architecture - Config splitting strategies - Checkpoint/resume functionality - Parallel scraping workflows - Complete MCP integration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
116
README.md
116
README.md
@@ -30,10 +30,14 @@ Skill Seeker is an automated tool that transforms any documentation website into
|
|||||||
✅ **Universal Scraper** - Works with ANY documentation website
|
✅ **Universal Scraper** - Works with ANY documentation website
|
||||||
✅ **AI-Powered Enhancement** - Transforms basic templates into comprehensive guides
|
✅ **AI-Powered Enhancement** - Transforms basic templates into comprehensive guides
|
||||||
✅ **MCP Server for Claude Code** - Use directly from Claude Code with natural language
|
✅ **MCP Server for Claude Code** - Use directly from Claude Code with natural language
|
||||||
|
✅ **Large Documentation Support** - Handle 10K-40K+ page docs with intelligent splitting
|
||||||
|
✅ **Router/Hub Skills** - Intelligent routing to specialized sub-skills
|
||||||
✅ **8 Ready-to-Use Presets** - Godot, React, Vue, Django, FastAPI, and more
|
✅ **8 Ready-to-Use Presets** - Godot, React, Vue, Django, FastAPI, and more
|
||||||
✅ **Smart Categorization** - Automatically organizes content by topic
|
✅ **Smart Categorization** - Automatically organizes content by topic
|
||||||
✅ **Code Language Detection** - Recognizes Python, JavaScript, C++, GDScript, etc.
|
✅ **Code Language Detection** - Recognizes Python, JavaScript, C++, GDScript, etc.
|
||||||
✅ **No API Costs** - FREE local enhancement using Claude Code Max
|
✅ **No API Costs** - FREE local enhancement using Claude Code Max
|
||||||
|
✅ **Checkpoint/Resume** - Never lose progress on long scrapes
|
||||||
|
✅ **Parallel Scraping** - Process multiple skills simultaneously
|
||||||
✅ **Caching System** - Scrape once, rebuild instantly
|
✅ **Caching System** - Scrape once, rebuild instantly
|
||||||
✅ **Fully Tested** - 96 tests with 100% pass rate
|
✅ **Fully Tested** - 96 tests with 100% pass rate
|
||||||
|
|
||||||
@@ -110,12 +114,13 @@ Package skill at output/react/
|
|||||||
- ✅ No manual CLI commands
|
- ✅ No manual CLI commands
|
||||||
- ✅ Natural language interface
|
- ✅ Natural language interface
|
||||||
- ✅ Integrated with your workflow
|
- ✅ Integrated with your workflow
|
||||||
- ✅ 6 tools available instantly
|
- ✅ 8 tools available instantly (includes large docs support!)
|
||||||
- ✅ **Tested and working** in production
|
- ✅ **Tested and working** in production
|
||||||
|
|
||||||
**Full guides:**
|
**Full guides:**
|
||||||
- 📘 [MCP Setup Guide](docs/MCP_SETUP.md) - Complete installation instructions
|
- 📘 [MCP Setup Guide](docs/MCP_SETUP.md) - Complete installation instructions
|
||||||
- 🧪 [MCP Testing Guide](docs/TEST_MCP_IN_CLAUDE_CODE.md) - Test all 6 tools
|
- 🧪 [MCP Testing Guide](docs/TEST_MCP_IN_CLAUDE_CODE.md) - Test all 8 tools
|
||||||
|
- 📦 [Large Documentation Guide](docs/LARGE_DOCUMENTATION.md) - Handle 10K-40K+ pages
|
||||||
|
|
||||||
### Method 2: CLI (Traditional)
|
### Method 2: CLI (Traditional)
|
||||||
|
|
||||||
@@ -246,22 +251,22 @@ python3 doc_scraper.py --config configs/react.json
|
|||||||
python3 doc_scraper.py --config configs/react.json --skip-scrape
|
python3 doc_scraper.py --config configs/react.json --skip-scrape
|
||||||
```
|
```
|
||||||
|
|
||||||
### 6. AI-Powered SKILL.md Enhancement (NEW!)
|
### 6. AI-Powered SKILL.md Enhancement
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Option 1: During scraping (API-based, requires API key)
|
# Option 1: During scraping (API-based, requires API key)
|
||||||
pip3 install anthropic
|
pip3 install anthropic
|
||||||
export ANTHROPIC_API_KEY=sk-ant-...
|
export ANTHROPIC_API_KEY=sk-ant-...
|
||||||
python3 doc_scraper.py --config configs/react.json --enhance
|
python3 cli/doc_scraper.py --config configs/react.json --enhance
|
||||||
|
|
||||||
# Option 2: During scraping (LOCAL, no API key - uses Claude Code Max)
|
# Option 2: During scraping (LOCAL, no API key - uses Claude Code Max)
|
||||||
python3 doc_scraper.py --config configs/react.json --enhance-local
|
python3 cli/doc_scraper.py --config configs/react.json --enhance-local
|
||||||
|
|
||||||
# Option 3: After scraping (API-based, standalone)
|
# Option 3: After scraping (API-based, standalone)
|
||||||
python3 enhance_skill.py output/react/
|
python3 cli/enhance_skill.py output/react/
|
||||||
|
|
||||||
# Option 4: After scraping (LOCAL, no API key, standalone)
|
# Option 4: After scraping (LOCAL, no API key, standalone)
|
||||||
python3 enhance_skill_local.py output/react/
|
python3 cli/enhance_skill_local.py output/react/
|
||||||
```
|
```
|
||||||
|
|
||||||
**What it does:**
|
**What it does:**
|
||||||
@@ -281,6 +286,101 @@ python3 enhance_skill_local.py output/react/
|
|||||||
- Takes 30-60 seconds
|
- Takes 30-60 seconds
|
||||||
- Quality: 9/10 (comparable to API version)
|
- Quality: 9/10 (comparable to API version)
|
||||||
|
|
||||||
|
### 7. Large Documentation Support (10K-40K+ Pages)
|
||||||
|
|
||||||
|
**For massive documentation sites like Godot (40K pages), AWS, or Microsoft Docs:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Estimate first (discover page count)
|
||||||
|
python3 cli/estimate_pages.py configs/godot.json
|
||||||
|
|
||||||
|
# 2. Auto-split into focused sub-skills
|
||||||
|
python3 cli/split_config.py configs/godot.json --strategy router
|
||||||
|
|
||||||
|
# Creates:
|
||||||
|
# - godot-scripting.json (5K pages)
|
||||||
|
# - godot-2d.json (8K pages)
|
||||||
|
# - godot-3d.json (10K pages)
|
||||||
|
# - godot-physics.json (6K pages)
|
||||||
|
# - godot-shaders.json (11K pages)
|
||||||
|
|
||||||
|
# 3. Scrape all in parallel (4-8 hours instead of 20-40!)
|
||||||
|
for config in configs/godot-*.json; do
|
||||||
|
python3 cli/doc_scraper.py --config $config &
|
||||||
|
done
|
||||||
|
wait
|
||||||
|
|
||||||
|
# 4. Generate intelligent router/hub skill
|
||||||
|
python3 cli/generate_router.py configs/godot-*.json
|
||||||
|
|
||||||
|
# 5. Package all skills
|
||||||
|
python3 cli/package_multi.py output/godot*/
|
||||||
|
|
||||||
|
# 6. Upload all .zip files to Claude
|
||||||
|
# Users just ask questions naturally!
|
||||||
|
# Router automatically directs to the right sub-skill!
|
||||||
|
```
|
||||||
|
|
||||||
|
**Split Strategies:**
|
||||||
|
- **auto** - Intelligently detects best strategy based on page count
|
||||||
|
- **category** - Split by documentation categories (scripting, 2d, 3d, etc.)
|
||||||
|
- **router** - Create hub skill + specialized sub-skills (RECOMMENDED)
|
||||||
|
- **size** - Split every N pages (for docs without clear categories)
|
||||||
|
|
||||||
|
**Benefits:**
|
||||||
|
- ✅ Faster scraping (parallel execution)
|
||||||
|
- ✅ More focused skills (better Claude performance)
|
||||||
|
- ✅ Easier maintenance (update one topic at a time)
|
||||||
|
- ✅ Natural user experience (router handles routing)
|
||||||
|
- ✅ Avoids context window limits
|
||||||
|
|
||||||
|
**Configuration:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "godot",
|
||||||
|
"max_pages": 40000,
|
||||||
|
"split_strategy": "router",
|
||||||
|
"split_config": {
|
||||||
|
"target_pages_per_skill": 5000,
|
||||||
|
"create_router": true,
|
||||||
|
"split_by_categories": ["scripting", "2d", "3d", "physics"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Full Guide:** [Large Documentation Guide](docs/LARGE_DOCUMENTATION.md)
|
||||||
|
|
||||||
|
### 8. Checkpoint/Resume for Long Scrapes
|
||||||
|
|
||||||
|
**Never lose progress on long-running scrapes:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Enable in config
|
||||||
|
{
|
||||||
|
"checkpoint": {
|
||||||
|
"enabled": true,
|
||||||
|
"interval": 1000 // Save every 1000 pages
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# If scrape is interrupted (Ctrl+C or crash)
|
||||||
|
python3 cli/doc_scraper.py --config configs/godot.json --resume
|
||||||
|
|
||||||
|
# Resume from last checkpoint
|
||||||
|
✅ Resuming from checkpoint (12,450 pages scraped)
|
||||||
|
⏭️ Skipping 12,450 already-scraped pages
|
||||||
|
🔄 Continuing from where we left off...
|
||||||
|
|
||||||
|
# Start fresh (clear checkpoint)
|
||||||
|
python3 cli/doc_scraper.py --config configs/godot.json --fresh
|
||||||
|
```
|
||||||
|
|
||||||
|
**Benefits:**
|
||||||
|
- ✅ Auto-saves every 1000 pages (configurable)
|
||||||
|
- ✅ Saves on interruption (Ctrl+C)
|
||||||
|
- ✅ Resume with `--resume` flag
|
||||||
|
- ✅ Never lose hours of scraping progress
|
||||||
|
|
||||||
## 🎯 Complete Workflows
|
## 🎯 Complete Workflows
|
||||||
|
|
||||||
### First Time (With Scraping + Enhancement)
|
### First Time (With Scraping + Enhancement)
|
||||||
@@ -552,8 +652,10 @@ python3 doc_scraper.py --config configs/godot.json
|
|||||||
## 📚 Documentation
|
## 📚 Documentation
|
||||||
|
|
||||||
- **[QUICKSTART.md](QUICKSTART.md)** - Get started in 3 steps
|
- **[QUICKSTART.md](QUICKSTART.md)** - Get started in 3 steps
|
||||||
|
- **[docs/LARGE_DOCUMENTATION.md](docs/LARGE_DOCUMENTATION.md)** - Handle 10K-40K+ page docs
|
||||||
- **[docs/ENHANCEMENT.md](docs/ENHANCEMENT.md)** - AI enhancement guide
|
- **[docs/ENHANCEMENT.md](docs/ENHANCEMENT.md)** - AI enhancement guide
|
||||||
- **[docs/UPLOAD_GUIDE.md](docs/UPLOAD_GUIDE.md)** - How to upload skills to Claude
|
- **[docs/UPLOAD_GUIDE.md](docs/UPLOAD_GUIDE.md)** - How to upload skills to Claude
|
||||||
|
- **[docs/MCP_SETUP.md](docs/MCP_SETUP.md)** - MCP integration setup
|
||||||
- **[docs/CLAUDE.md](docs/CLAUDE.md)** - Technical architecture
|
- **[docs/CLAUDE.md](docs/CLAUDE.md)** - Technical architecture
|
||||||
- **[STRUCTURE.md](STRUCTURE.md)** - Repository structure
|
- **[STRUCTURE.md](STRUCTURE.md)** - Repository structure
|
||||||
|
|
||||||
|
|||||||
161
docs/CLAUDE.md
161
docs/CLAUDE.md
@@ -16,26 +16,50 @@ pip3 install requests beautifulsoup4
|
|||||||
|
|
||||||
### Run with a preset configuration
|
### Run with a preset configuration
|
||||||
```bash
|
```bash
|
||||||
python3 doc_scraper.py --config configs/godot.json
|
python3 cli/doc_scraper.py --config configs/godot.json
|
||||||
python3 doc_scraper.py --config configs/react.json
|
python3 cli/doc_scraper.py --config configs/react.json
|
||||||
python3 doc_scraper.py --config configs/vue.json
|
python3 cli/doc_scraper.py --config configs/vue.json
|
||||||
python3 doc_scraper.py --config configs/django.json
|
python3 cli/doc_scraper.py --config configs/django.json
|
||||||
python3 doc_scraper.py --config configs/fastapi.json
|
python3 cli/doc_scraper.py --config configs/fastapi.json
|
||||||
```
|
```
|
||||||
|
|
||||||
### Interactive mode (for new frameworks)
|
### Interactive mode (for new frameworks)
|
||||||
```bash
|
```bash
|
||||||
python3 doc_scraper.py --interactive
|
python3 cli/doc_scraper.py --interactive
|
||||||
```
|
```
|
||||||
|
|
||||||
### Quick mode (minimal config)
|
### Quick mode (minimal config)
|
||||||
```bash
|
```bash
|
||||||
python3 doc_scraper.py --name react --url https://react.dev/ --description "React framework"
|
python3 cli/doc_scraper.py --name react --url https://react.dev/ --description "React framework"
|
||||||
```
|
```
|
||||||
|
|
||||||
### Skip scraping (use cached data)
|
### Skip scraping (use cached data)
|
||||||
```bash
|
```bash
|
||||||
python3 doc_scraper.py --config configs/godot.json --skip-scrape
|
python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape
|
||||||
|
```
|
||||||
|
|
||||||
|
### Resume interrupted scrapes
|
||||||
|
```bash
|
||||||
|
# If scrape was interrupted
|
||||||
|
python3 cli/doc_scraper.py --config configs/godot.json --resume
|
||||||
|
|
||||||
|
# Start fresh (clear checkpoint)
|
||||||
|
python3 cli/doc_scraper.py --config configs/godot.json --fresh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Large documentation (10K-40K+ pages)
|
||||||
|
```bash
|
||||||
|
# 1. Estimate page count
|
||||||
|
python3 cli/estimate_pages.py configs/godot.json
|
||||||
|
|
||||||
|
# 2. Split into focused sub-skills
|
||||||
|
python3 cli/split_config.py configs/godot.json --strategy router
|
||||||
|
|
||||||
|
# 3. Generate router skill
|
||||||
|
python3 cli/generate_router.py configs/godot-*.json
|
||||||
|
|
||||||
|
# 4. Package multiple skills
|
||||||
|
python3 cli/package_multi.py output/godot*/
|
||||||
```
|
```
|
||||||
|
|
||||||
### AI-powered SKILL.md enhancement
|
### AI-powered SKILL.md enhancement
|
||||||
@@ -43,20 +67,35 @@ python3 doc_scraper.py --config configs/godot.json --skip-scrape
|
|||||||
# Option 1: During scraping (API-based, requires ANTHROPIC_API_KEY)
|
# Option 1: During scraping (API-based, requires ANTHROPIC_API_KEY)
|
||||||
pip3 install anthropic
|
pip3 install anthropic
|
||||||
export ANTHROPIC_API_KEY=sk-ant-...
|
export ANTHROPIC_API_KEY=sk-ant-...
|
||||||
python3 doc_scraper.py --config configs/react.json --enhance
|
python3 cli/doc_scraper.py --config configs/react.json --enhance
|
||||||
|
|
||||||
# Option 2: During scraping (LOCAL, no API key - uses Claude Code Max)
|
# Option 2: During scraping (LOCAL, no API key - uses Claude Code Max)
|
||||||
python3 doc_scraper.py --config configs/react.json --enhance-local
|
python3 cli/doc_scraper.py --config configs/react.json --enhance-local
|
||||||
|
|
||||||
# Option 3: Standalone after scraping (API-based)
|
# Option 3: Standalone after scraping (API-based)
|
||||||
python3 enhance_skill.py output/react/
|
python3 cli/enhance_skill.py output/react/
|
||||||
|
|
||||||
# Option 4: Standalone after scraping (LOCAL, no API key)
|
# Option 4: Standalone after scraping (LOCAL, no API key)
|
||||||
python3 enhance_skill_local.py output/react/
|
python3 cli/enhance_skill_local.py output/react/
|
||||||
```
|
```
|
||||||
|
|
||||||
The LOCAL enhancement option (`--enhance-local` or `enhance_skill_local.py`) opens a new terminal with Claude Code, which analyzes reference files and enhances SKILL.md automatically. This requires Claude Code Max plan but no API key.
|
The LOCAL enhancement option (`--enhance-local` or `enhance_skill_local.py`) opens a new terminal with Claude Code, which analyzes reference files and enhances SKILL.md automatically. This requires Claude Code Max plan but no API key.
|
||||||
|
|
||||||
|
### MCP Integration (Claude Code)
|
||||||
|
```bash
|
||||||
|
# One-time setup
|
||||||
|
./setup_mcp.sh
|
||||||
|
|
||||||
|
# Then in Claude Code, use natural language:
|
||||||
|
"List all available configs"
|
||||||
|
"Generate config for Tailwind at https://tailwindcss.com/docs"
|
||||||
|
"Split configs/godot.json using router strategy"
|
||||||
|
"Generate router for configs/godot-*.json"
|
||||||
|
"Package skill at output/react/"
|
||||||
|
```
|
||||||
|
|
||||||
|
8 MCP tools available: list_configs, generate_config, validate_config, estimate_pages, scrape_docs, package_skill, split_config, generate_router
|
||||||
|
|
||||||
### Test with limited pages (edit config first)
|
### Test with limited pages (edit config first)
|
||||||
Set `"max_pages": 20` in the config file to test with fewer pages.
|
Set `"max_pages": 20` in the config file to test with fewer pages.
|
||||||
|
|
||||||
@@ -84,19 +123,35 @@ The entire tool is contained in `doc_scraper.py` (~737 lines). It follows a clas
|
|||||||
|
|
||||||
### Directory Structure
|
### Directory Structure
|
||||||
```
|
```
|
||||||
doc-to-skill/
|
Skill_Seekers/
|
||||||
├── doc_scraper.py # Main scraping & building tool
|
├── cli/ # CLI tools
|
||||||
├── enhance_skill.py # AI enhancement (API-based)
|
│ ├── doc_scraper.py # Main scraping & building tool
|
||||||
├── enhance_skill_local.py # AI enhancement (LOCAL, no API)
|
│ ├── enhance_skill.py # AI enhancement (API-based)
|
||||||
├── configs/ # Preset configurations
|
│ ├── enhance_skill_local.py # AI enhancement (LOCAL, no API)
|
||||||
|
│ ├── estimate_pages.py # Page count estimator
|
||||||
|
│ ├── split_config.py # Large docs splitter (NEW)
|
||||||
|
│ ├── generate_router.py # Router skill generator (NEW)
|
||||||
|
│ ├── package_skill.py # Single skill packager
|
||||||
|
│ └── package_multi.py # Multi-skill packager (NEW)
|
||||||
|
├── mcp/ # MCP server
|
||||||
|
│ ├── server.py # 8 MCP tools (includes split/router)
|
||||||
|
│ └── README.md
|
||||||
|
├── configs/ # Preset configurations
|
||||||
│ ├── godot.json
|
│ ├── godot.json
|
||||||
|
│ ├── godot-large-example.json # Large docs example (NEW)
|
||||||
│ ├── react.json
|
│ ├── react.json
|
||||||
│ ├── steam-inventory.json
|
|
||||||
│ └── ...
|
│ └── ...
|
||||||
└── output/
|
├── docs/ # Documentation
|
||||||
|
│ ├── CLAUDE.md # Technical architecture (this file)
|
||||||
|
│ ├── LARGE_DOCUMENTATION.md # Large docs guide (NEW)
|
||||||
|
│ ├── ENHANCEMENT.md
|
||||||
|
│ ├── MCP_SETUP.md
|
||||||
|
│ └── ...
|
||||||
|
└── output/ # Generated output (git-ignored)
|
||||||
├── {name}_data/ # Raw scraped data (cached)
|
├── {name}_data/ # Raw scraped data (cached)
|
||||||
│ ├── pages/ # Individual page JSONs
|
│ ├── pages/ # Individual page JSONs
|
||||||
│ └── summary.json # Scraping summary
|
│ ├── summary.json # Scraping summary
|
||||||
|
│ └── checkpoint.json # Resume checkpoint (NEW)
|
||||||
└── {name}/ # Generated skill
|
└── {name}/ # Generated skill
|
||||||
├── SKILL.md # Main skill file with examples
|
├── SKILL.md # Main skill file with examples
|
||||||
├── SKILL.md.backup # Backup (if enhanced)
|
├── SKILL.md.backup # Backup (if enhanced)
|
||||||
@@ -124,6 +179,14 @@ Config files in `configs/*.json` contain:
|
|||||||
- `categories`: Keyword-based categorization mapping
|
- `categories`: Keyword-based categorization mapping
|
||||||
- `rate_limit`: Delay between requests (seconds)
|
- `rate_limit`: Delay between requests (seconds)
|
||||||
- `max_pages`: Maximum pages to scrape
|
- `max_pages`: Maximum pages to scrape
|
||||||
|
- `split_strategy`: (Optional) How to split large docs: "auto", "category", "router", "size"
|
||||||
|
- `split_config`: (Optional) Split configuration
|
||||||
|
- `target_pages_per_skill`: Pages per sub-skill (default: 5000)
|
||||||
|
- `create_router`: Create router/hub skill (default: true)
|
||||||
|
- `split_by_categories`: Category names to split by
|
||||||
|
- `checkpoint`: (Optional) Checkpoint/resume configuration
|
||||||
|
- `enabled`: Enable checkpointing (default: false)
|
||||||
|
- `interval`: Save every N pages (default: 1000)
|
||||||
|
|
||||||
### Key Features
|
### Key Features
|
||||||
|
|
||||||
@@ -154,6 +217,20 @@ Config files in `configs/*.json` contain:
|
|||||||
- Extracts best examples, explains key concepts, adds navigation guidance
|
- Extracts best examples, explains key concepts, adds navigation guidance
|
||||||
- Success rate: 9/10 quality (based on steam-economy test)
|
- Success rate: 9/10 quality (based on steam-economy test)
|
||||||
|
|
||||||
|
**Large Documentation Support (NEW)**: Handle 10K-40K+ page documentation:
|
||||||
|
- `split_config.py`: Split large configs into multiple focused sub-skills
|
||||||
|
- `generate_router.py`: Create intelligent router/hub skills that direct queries
|
||||||
|
- `package_multi.py`: Package multiple skills at once
|
||||||
|
- 4 split strategies: auto, category, router, size
|
||||||
|
- Parallel scraping support for faster processing
|
||||||
|
- MCP integration for natural language usage
|
||||||
|
|
||||||
|
**Checkpoint/Resume (NEW)**: Never lose progress on long scrapes:
|
||||||
|
- Auto-saves every N pages (configurable, default: 1000)
|
||||||
|
- Resume with `--resume` flag
|
||||||
|
- Clear checkpoint with `--fresh` flag
|
||||||
|
- Saves on interruption (Ctrl+C)
|
||||||
|
|
||||||
## Key Code Locations
|
## Key Code Locations
|
||||||
|
|
||||||
- **URL validation**: `is_valid_url()` doc_scraper.py:47-62
|
- **URL validation**: `is_valid_url()` doc_scraper.py:47-62
|
||||||
@@ -172,11 +249,11 @@ Config files in `configs/*.json` contain:
|
|||||||
### First time scraping (with scraping)
|
### First time scraping (with scraping)
|
||||||
```bash
|
```bash
|
||||||
# 1. Scrape + Build
|
# 1. Scrape + Build
|
||||||
python3 doc_scraper.py --config configs/godot.json
|
python3 cli/doc_scraper.py --config configs/godot.json
|
||||||
# Time: 20-40 minutes
|
# Time: 20-40 minutes
|
||||||
|
|
||||||
# 2. Package (assuming skill-creator is available)
|
# 2. Package
|
||||||
python3 package_skill.py output/godot/
|
python3 cli/package_skill.py output/godot/
|
||||||
|
|
||||||
# Result: godot.zip
|
# Result: godot.zip
|
||||||
```
|
```
|
||||||
@@ -184,24 +261,54 @@ python3 package_skill.py output/godot/
|
|||||||
### Using cached data (fast iteration)
|
### Using cached data (fast iteration)
|
||||||
```bash
|
```bash
|
||||||
# 1. Use existing data
|
# 1. Use existing data
|
||||||
python3 doc_scraper.py --config configs/godot.json --skip-scrape
|
python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape
|
||||||
# Time: 1-3 minutes
|
# Time: 1-3 minutes
|
||||||
|
|
||||||
# 2. Package
|
# 2. Package
|
||||||
python3 package_skill.py output/godot/
|
python3 cli/package_skill.py output/godot/
|
||||||
```
|
```
|
||||||
|
|
||||||
### Creating a new framework config
|
### Creating a new framework config
|
||||||
```bash
|
```bash
|
||||||
# Option 1: Interactive
|
# Option 1: Interactive
|
||||||
python3 doc_scraper.py --interactive
|
python3 cli/doc_scraper.py --interactive
|
||||||
|
|
||||||
# Option 2: Copy and modify
|
# Option 2: Copy and modify
|
||||||
cp configs/react.json configs/myframework.json
|
cp configs/react.json configs/myframework.json
|
||||||
# Edit configs/myframework.json
|
# Edit configs/myframework.json
|
||||||
python3 doc_scraper.py --config configs/myframework.json
|
python3 cli/doc_scraper.py --config configs/myframework.json
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Large documentation workflow (40K pages)
|
||||||
|
```bash
|
||||||
|
# 1. Estimate page count (fast, 1-2 minutes)
|
||||||
|
python3 cli/estimate_pages.py configs/godot.json
|
||||||
|
|
||||||
|
# 2. Split into focused sub-skills
|
||||||
|
python3 cli/split_config.py configs/godot.json --strategy router --target-pages 5000
|
||||||
|
|
||||||
|
# Creates: godot-scripting.json, godot-2d.json, godot-3d.json, etc.
|
||||||
|
|
||||||
|
# 3. Scrape all in parallel (4-8 hours instead of 20-40!)
|
||||||
|
for config in configs/godot-*.json; do
|
||||||
|
python3 cli/doc_scraper.py --config $config &
|
||||||
|
done
|
||||||
|
wait
|
||||||
|
|
||||||
|
# 4. Generate intelligent router skill
|
||||||
|
python3 cli/generate_router.py configs/godot-*.json
|
||||||
|
|
||||||
|
# 5. Package all skills
|
||||||
|
python3 cli/package_multi.py output/godot*/
|
||||||
|
|
||||||
|
# 6. Upload all .zip files to Claude
|
||||||
|
# Result: Router automatically directs queries to the right sub-skill!
|
||||||
|
```
|
||||||
|
|
||||||
|
**Time savings:** Parallel scraping reduces 20-40 hours to 4-8 hours
|
||||||
|
|
||||||
|
**See full guide:** [Large Documentation Guide](LARGE_DOCUMENTATION.md)
|
||||||
|
|
||||||
## Testing Selectors
|
## Testing Selectors
|
||||||
|
|
||||||
To find the right CSS selectors for a documentation site:
|
To find the right CSS selectors for a documentation site:
|
||||||
|
|||||||
@@ -2,10 +2,10 @@
|
|||||||
|
|
||||||
Step-by-step guide to set up the Skill Seeker MCP server with Claude Code.
|
Step-by-step guide to set up the Skill Seeker MCP server with Claude Code.
|
||||||
|
|
||||||
**✅ Fully Tested and Working**: All 6 MCP tools verified in production use with Claude Code
|
**✅ Fully Tested and Working**: All 8 MCP tools verified in production use with Claude Code
|
||||||
- ✅ 25 comprehensive unit tests (100% pass rate)
|
- ✅ 31 comprehensive unit tests (100% pass rate)
|
||||||
- ✅ Integration tested via actual Claude Code MCP protocol
|
- ✅ Integration tested via actual Claude Code MCP protocol
|
||||||
- ✅ All 6 tools working with natural language commands
|
- ✅ All 8 tools working with natural language commands (includes large docs support!)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -11,6 +11,8 @@ This MCP server allows Claude Code to use Skill Seeker's tools directly through
|
|||||||
- Scrape documentation and build skills
|
- Scrape documentation and build skills
|
||||||
- Package skills into `.zip` files
|
- Package skills into `.zip` files
|
||||||
- List and validate configurations
|
- List and validate configurations
|
||||||
|
- **NEW:** Split large documentation (10K-40K+ pages) into focused sub-skills
|
||||||
|
- **NEW:** Generate intelligent router/hub skills for split documentation
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
@@ -70,7 +72,7 @@ You should see a list of preset configurations (Godot, React, Vue, etc.).
|
|||||||
|
|
||||||
## Available Tools
|
## Available Tools
|
||||||
|
|
||||||
The MCP server exposes 6 tools:
|
The MCP server exposes 8 tools:
|
||||||
|
|
||||||
### 1. `generate_config`
|
### 1. `generate_config`
|
||||||
Create a new configuration file for any documentation website.
|
Create a new configuration file for any documentation website.
|
||||||
@@ -145,6 +147,44 @@ Validate a config file for errors.
|
|||||||
Validate configs/godot.json
|
Validate configs/godot.json
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### 7. `split_config` (NEW)
|
||||||
|
Split large documentation config into multiple focused skills. For 10K+ page documentation.
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `config_path` (required): Path to config JSON file (e.g., "configs/godot.json")
|
||||||
|
- `strategy` (optional): Split strategy - "auto", "none", "category", "router", "size" (default: "auto")
|
||||||
|
- `target_pages` (optional): Target pages per skill (default: 5000)
|
||||||
|
- `dry_run` (optional): Preview without saving files (default: false)
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```
|
||||||
|
Split configs/godot.json using router strategy with 5000 pages per skill
|
||||||
|
```
|
||||||
|
|
||||||
|
**Strategies:**
|
||||||
|
- **auto** - Intelligently detects best strategy based on page count and config
|
||||||
|
- **category** - Split by documentation categories (creates focused sub-skills)
|
||||||
|
- **router** - Create router/hub skill + specialized sub-skills (RECOMMENDED for 10K+ pages)
|
||||||
|
- **size** - Split every N pages (for docs without clear categories)
|
||||||
|
|
||||||
|
### 8. `generate_router` (NEW)
|
||||||
|
Generate router/hub skill for split documentation. Creates intelligent routing to sub-skills.
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `config_pattern` (required): Config pattern for sub-skills (e.g., "configs/godot-*.json")
|
||||||
|
- `router_name` (optional): Router skill name (inferred from configs if not provided)
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```
|
||||||
|
Generate router for configs/godot-*.json
|
||||||
|
```
|
||||||
|
|
||||||
|
**What it does:**
|
||||||
|
- Analyzes all sub-skill configs
|
||||||
|
- Extracts routing keywords from categories and names
|
||||||
|
- Creates router SKILL.md with intelligent routing logic
|
||||||
|
- Users can ask questions naturally, router directs to appropriate sub-skill
|
||||||
|
|
||||||
## Example Workflows
|
## Example Workflows
|
||||||
|
|
||||||
### Generate a New Skill from Scratch
|
### Generate a New Skill from Scratch
|
||||||
@@ -200,6 +240,54 @@ User: Scrape docs using configs/godot.json
|
|||||||
Claude: [Starts scraping...]
|
Claude: [Starts scraping...]
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Large Documentation (40K Pages) - NEW
|
||||||
|
|
||||||
|
```
|
||||||
|
User: Estimate pages for configs/godot.json
|
||||||
|
|
||||||
|
Claude: 📊 Estimated pages: 40,000
|
||||||
|
⚠️ Large documentation detected!
|
||||||
|
💡 Recommend splitting into multiple skills
|
||||||
|
|
||||||
|
User: Split configs/godot.json using router strategy
|
||||||
|
|
||||||
|
Claude: ✅ Split complete!
|
||||||
|
Created 5 sub-skills:
|
||||||
|
- godot-scripting.json (5,000 pages)
|
||||||
|
- godot-2d.json (8,000 pages)
|
||||||
|
- godot-3d.json (10,000 pages)
|
||||||
|
- godot-physics.json (6,000 pages)
|
||||||
|
- godot-shaders.json (11,000 pages)
|
||||||
|
|
||||||
|
User: Scrape all godot sub-skills in parallel
|
||||||
|
|
||||||
|
Claude: [Starts scraping all 5 configs in parallel...]
|
||||||
|
✅ All skills created in 4-8 hours instead of 20-40!
|
||||||
|
|
||||||
|
User: Generate router for configs/godot-*.json
|
||||||
|
|
||||||
|
Claude: ✅ Router skill created at output/godot/
|
||||||
|
Routing logic:
|
||||||
|
- "scripting", "gdscript" → godot-scripting
|
||||||
|
- "2d", "sprites", "tilemap" → godot-2d
|
||||||
|
- "3d", "meshes", "camera" → godot-3d
|
||||||
|
- "physics", "collision" → godot-physics
|
||||||
|
- "shaders", "visual shader" → godot-shaders
|
||||||
|
|
||||||
|
User: Package all godot skills
|
||||||
|
|
||||||
|
Claude: ✅ 6 skills packaged:
|
||||||
|
- godot.zip (router)
|
||||||
|
- godot-scripting.zip
|
||||||
|
- godot-2d.zip
|
||||||
|
- godot-3d.zip
|
||||||
|
- godot-physics.zip
|
||||||
|
- godot-shaders.zip
|
||||||
|
|
||||||
|
Upload all to Claude!
|
||||||
|
Users just ask questions naturally - router handles routing!
|
||||||
|
```
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
### Server Structure
|
### Server Structure
|
||||||
@@ -262,10 +350,12 @@ python3 -m pytest tests/test_mcp_server.py -v
|
|||||||
- **package_skill** (2 tests)
|
- **package_skill** (2 tests)
|
||||||
- **list_configs** (3 tests)
|
- **list_configs** (3 tests)
|
||||||
- **validate_config** (3 tests)
|
- **validate_config** (3 tests)
|
||||||
|
- **split_config** (3 tests) - NEW
|
||||||
|
- **generate_router** (3 tests) - NEW
|
||||||
- **Tool routing** (2 tests)
|
- **Tool routing** (2 tests)
|
||||||
- **Integration** (1 test)
|
- **Integration** (1 test)
|
||||||
|
|
||||||
**Total: 25 tests | Pass rate: 100%**
|
**Total: 31 tests | Pass rate: 100%**
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
@@ -401,9 +491,14 @@ For API-based enhancement (requires Anthropic API key):
|
|||||||
| Generate config | <1s | Creates JSON file |
|
| Generate config | <1s | Creates JSON file |
|
||||||
| Validate config | <1s | Quick validation |
|
| Validate config | <1s | Quick validation |
|
||||||
| Estimate pages | 1-2min | Fast, no data download |
|
| Estimate pages | 1-2min | Fast, no data download |
|
||||||
|
| Split config | 1-3min | Analyzes and creates sub-configs |
|
||||||
|
| Generate router | 10-30s | Creates router SKILL.md |
|
||||||
| Scrape docs | 15-45min | First time only |
|
| Scrape docs | 15-45min | First time only |
|
||||||
|
| Scrape docs (40K pages) | 20-40hrs | Sequential |
|
||||||
|
| Scrape docs (40K pages, parallel) | 4-8hrs | 5 skills in parallel |
|
||||||
| Scrape (cached) | <1min | With `skip_scrape` |
|
| Scrape (cached) | <1min | With `skip_scrape` |
|
||||||
| Package skill | 5-10s | Creates .zip |
|
| Package skill | 5-10s | Creates .zip |
|
||||||
|
| Package multi | 30-60s | Packages 5-10 skills |
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user