**Documentation Added:** - docs/MCP_SETUP.md: Complete 400+ line setup guide - Prerequisites and installation steps - Configuration examples for Claude Code - Verification and troubleshooting - 3 usage examples and advanced configuration - End-to-end workflow and quick reference - tests/mcp_integration_test.md: Comprehensive test template - 10 test cases covering all MCP tools - Performance metrics table - Issue tracking and environment setup - Setup and cleanup scripts - .claude/mcp_config.example.json: Example MCP configuration **Documentation Updated:** - STRUCTURE.md: Complete monorepo structure documentation - CLAUDE.md: All Python script paths updated to cli/ prefix - docs/USAGE.md: All command examples updated for monorepo - TODO.md: Current sprint status and completed tasks **Summary:** - Issues #2 and #3 handled (MCP setup guide + integration tests) - All documentation now reflects monorepo structure (cli/ + mcp/) - Tests: 71/71 passing (100%) - Ready for MCP server testing with Claude Code 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
529 lines
15 KiB
Markdown
529 lines
15 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Overview
|
|
|
|
Skill Seeker automatically converts any documentation website into a Claude AI skill. It scrapes documentation, organizes content, extracts code patterns, and packages everything into an uploadable `.zip` file for Claude.
|
|
|
|
## Prerequisites
|
|
|
|
**Python Version:** Python 3.7 or higher
|
|
|
|
**Required Dependencies:**
|
|
```bash
|
|
pip3 install requests beautifulsoup4
|
|
```
|
|
|
|
**Optional (for API-based enhancement):**
|
|
```bash
|
|
pip3 install anthropic
|
|
export ANTHROPIC_API_KEY=sk-ant-...
|
|
```
|
|
|
|
## Core Commands
|
|
|
|
### Quick Start - Use a Preset
|
|
|
|
```bash
|
|
# Scrape and build with a preset configuration
|
|
python3 cli/doc_scraper.py --config configs/godot.json
|
|
python3 cli/doc_scraper.py --config configs/react.json
|
|
python3 cli/doc_scraper.py --config configs/vue.json
|
|
python3 cli/doc_scraper.py --config configs/django.json
|
|
python3 cli/doc_scraper.py --config configs/fastapi.json
|
|
```
|
|
|
|
### First-Time User Workflow (Recommended)
|
|
|
|
```bash
|
|
# 1. Install dependencies (one-time)
|
|
pip3 install requests beautifulsoup4
|
|
|
|
# 2. Estimate page count BEFORE scraping (fast, no data download)
|
|
python3 cli/estimate_pages.py configs/godot.json
|
|
# Time: ~1-2 minutes, shows estimated total pages and recommended max_pages
|
|
|
|
# 3. Scrape with local enhancement (uses Claude Code Max, no API key)
|
|
python3 cli/doc_scraper.py --config configs/godot.json --enhance-local
|
|
# Time: 20-40 minutes scraping + 60 seconds enhancement
|
|
|
|
# 4. Package the skill
|
|
python3 cli/package_skill.py output/godot/
|
|
|
|
# Result: godot.zip ready to upload to Claude
|
|
```
|
|
|
|
### Interactive Mode
|
|
|
|
```bash
|
|
# Step-by-step configuration wizard
|
|
python3 cli/doc_scraper.py --interactive
|
|
```
|
|
|
|
### Quick Mode (Minimal Config)
|
|
|
|
```bash
|
|
# Create skill from any documentation URL
|
|
python3 cli/doc_scraper.py --name react --url https://react.dev/ --description "React framework for UIs"
|
|
```
|
|
|
|
### Skip Scraping (Use Cached Data)
|
|
|
|
```bash
|
|
# Fast rebuild using previously scraped data
|
|
python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape
|
|
# Time: 1-3 minutes (instant rebuild)
|
|
```
|
|
|
|
### Enhancement Options
|
|
|
|
**LOCAL Enhancement (Recommended - No API Key Required):**
|
|
```bash
|
|
# During scraping
|
|
python3 cli/doc_scraper.py --config configs/react.json --enhance-local
|
|
|
|
# Standalone after scraping
|
|
python3 cli/enhance_skill_local.py output/react/
|
|
```
|
|
|
|
**API Enhancement (Alternative - Requires API Key):**
|
|
```bash
|
|
# During scraping
|
|
python3 cli/doc_scraper.py --config configs/react.json --enhance
|
|
|
|
# Standalone after scraping
|
|
python3 cli/enhance_skill.py output/react/
|
|
python3 cli/enhance_skill.py output/react/ --api-key sk-ant-...
|
|
```
|
|
|
|
### Package the Skill
|
|
|
|
```bash
|
|
# Package skill directory into .zip file
|
|
python3 cli/package_skill.py output/godot/
|
|
# Result: output/godot.zip
|
|
```
|
|
|
|
### Force Re-scrape
|
|
|
|
```bash
|
|
# Delete cached data and re-scrape from scratch
|
|
rm -rf output/godot_data/
|
|
python3 cli/doc_scraper.py --config configs/godot.json
|
|
```
|
|
|
|
### Estimate Page Count (Before Scraping)
|
|
|
|
```bash
|
|
# Quick estimation - discover up to 100 pages
|
|
python3 cli/estimate_pages.py configs/react.json --max-discovery 100
|
|
# Time: ~30-60 seconds
|
|
|
|
# Full estimation - discover up to 1000 pages (default)
|
|
python3 cli/estimate_pages.py configs/godot.json
|
|
# Time: ~1-2 minutes
|
|
|
|
# Deep estimation - discover up to 2000 pages
|
|
python3 cli/estimate_pages.py configs/vue.json --max-discovery 2000
|
|
# Time: ~3-5 minutes
|
|
|
|
# What it shows:
|
|
# - Estimated total pages
|
|
# - Recommended max_pages value
|
|
# - Estimated scraping time
|
|
# - Discovery rate (pages/sec)
|
|
```
|
|
|
|
**Why use estimation:**
|
|
- Validates config URL patterns before full scrape
|
|
- Helps set optimal `max_pages` value
|
|
- Estimates total scraping time
|
|
- Fast (only HEAD requests + minimal parsing)
|
|
- No data downloaded or stored
|
|
|
|
## Repository Architecture
|
|
|
|
### File Structure
|
|
|
|
```
|
|
Skill_Seekers/
|
|
├── cli/doc_scraper.py # Main tool (single-file, ~790 lines)
|
|
├── cli/estimate_pages.py # Page count estimator (fast, no data)
|
|
├── cli/enhance_skill.py # AI enhancement (API-based)
|
|
├── cli/enhance_skill_local.py # AI enhancement (LOCAL, no API)
|
|
├── cli/package_skill.py # Skill packager
|
|
├── cli/run_tests.py # Test runner (71 tests)
|
|
├── configs/ # Preset configurations
|
|
│ ├── godot.json
|
|
│ ├── react.json
|
|
│ ├── vue.json
|
|
│ ├── django.json
|
|
│ ├── fastapi.json
|
|
│ └── steam-economy-complete.json
|
|
├── docs/ # Documentation
|
|
│ ├── CLAUDE.md # Detailed technical architecture
|
|
│ ├── ENHANCEMENT.md # Enhancement guide
|
|
│ └── UPLOAD_GUIDE.md # How to upload skills
|
|
└── output/ # Generated output (git-ignored)
|
|
├── {name}_data/ # Scraped raw data (cached)
|
|
│ ├── pages/*.json # Individual page data
|
|
│ └── summary.json # Scraping summary
|
|
└── {name}/ # Built skill directory
|
|
├── SKILL.md # Main skill file
|
|
├── SKILL.md.backup # Backup (if enhanced)
|
|
├── references/ # Categorized documentation
|
|
│ ├── index.md
|
|
│ ├── getting_started.md
|
|
│ ├── api.md
|
|
│ └── ...
|
|
├── scripts/ # Empty (user scripts)
|
|
└── assets/ # Empty (user assets)
|
|
```
|
|
|
|
### Data Flow
|
|
|
|
1. **Scrape Phase** (`scrape_all()` in doc_scraper.py:228-251):
|
|
- Input: Config JSON (name, base_url, selectors, url_patterns, categories)
|
|
- Process: BFS traversal from base_url, respecting include/exclude patterns
|
|
- Output: `output/{name}_data/pages/*.json` + `summary.json`
|
|
|
|
2. **Build Phase** (`build_skill()` in doc_scraper.py:561-601):
|
|
- Input: Scraped JSON data from `output/{name}_data/`
|
|
- Process: Load pages → Smart categorize → Extract patterns → Generate references
|
|
- Output: `output/{name}/SKILL.md` + `output/{name}/references/*.md`
|
|
|
|
3. **Enhancement Phase** (optional):
|
|
- Input: Built skill directory with references
|
|
- Process: Claude analyzes references and rewrites SKILL.md
|
|
- Output: Enhanced SKILL.md with real examples and guidance
|
|
|
|
4. **Package Phase**:
|
|
- Input: Skill directory
|
|
- Process: Zip all files (excluding .backup)
|
|
- Output: `{name}.zip`
|
|
|
|
### Configuration File Structure
|
|
|
|
Config files (`configs/*.json`) define scraping behavior:
|
|
|
|
```json
|
|
{
|
|
"name": "godot",
|
|
"description": "When to use this skill",
|
|
"base_url": "https://docs.godotengine.org/en/stable/",
|
|
"selectors": {
|
|
"main_content": "div[role='main']",
|
|
"title": "title",
|
|
"code_blocks": "pre"
|
|
},
|
|
"url_patterns": {
|
|
"include": [],
|
|
"exclude": ["/search.html", "/_static/"]
|
|
},
|
|
"categories": {
|
|
"getting_started": ["introduction", "getting_started"],
|
|
"scripting": ["scripting", "gdscript"],
|
|
"api": ["api", "reference", "class"]
|
|
},
|
|
"rate_limit": 0.5,
|
|
"max_pages": 500
|
|
}
|
|
```
|
|
|
|
**Config Parameters:**
|
|
- `name`: Skill identifier (output directory name)
|
|
- `description`: When Claude should use this skill
|
|
- `base_url`: Starting URL for scraping
|
|
- `selectors.main_content`: CSS selector for main content (common: `article`, `main`, `div[role="main"]`)
|
|
- `selectors.title`: CSS selector for page title
|
|
- `selectors.code_blocks`: CSS selector for code samples
|
|
- `url_patterns.include`: Only scrape URLs containing these patterns
|
|
- `url_patterns.exclude`: Skip URLs containing these patterns
|
|
- `categories`: Keyword mapping for categorization
|
|
- `rate_limit`: Delay between requests (seconds)
|
|
- `max_pages`: Maximum pages to scrape
|
|
|
|
## Key Features & Implementation
|
|
|
|
### Auto-Detect Existing Data
|
|
Tool checks for `output/{name}_data/` and prompts to reuse, avoiding re-scraping (check_existing_data() in doc_scraper.py:653-660).
|
|
|
|
### Language Detection
|
|
Detects code languages from:
|
|
1. CSS class attributes (`language-*`, `lang-*`)
|
|
2. Heuristics (keywords like `def`, `const`, `func`, etc.)
|
|
|
|
See: `detect_language()` in doc_scraper.py:135-165
|
|
|
|
### Pattern Extraction
|
|
Looks for "Example:", "Pattern:", "Usage:" markers in content and extracts following code blocks (up to 5 per page).
|
|
|
|
See: `extract_patterns()` in doc_scraper.py:167-183
|
|
|
|
### Smart Categorization
|
|
- Scores pages against category keywords (3 points for URL match, 2 for title, 1 for content)
|
|
- Threshold of 2+ for categorization
|
|
- Auto-infers categories from URL segments if none provided
|
|
- Falls back to "other" category
|
|
|
|
See: `smart_categorize()` and `infer_categories()` in doc_scraper.py:282-351
|
|
|
|
### Enhanced SKILL.md Generation
|
|
Generated with:
|
|
- Real code examples from documentation (language-annotated)
|
|
- Quick reference patterns extracted from docs
|
|
- Common pattern section
|
|
- Category file listings
|
|
|
|
See: `create_enhanced_skill_md()` in doc_scraper.py:426-542
|
|
|
|
## Common Workflows
|
|
|
|
### First Time (With Scraping + Enhancement)
|
|
|
|
```bash
|
|
# 1. Scrape + Build + AI Enhancement (LOCAL, no API key)
|
|
python3 cli/doc_scraper.py --config configs/godot.json --enhance-local
|
|
|
|
# 2. Wait for enhancement terminal to close (~60 seconds)
|
|
|
|
# 3. Verify quality
|
|
cat output/godot/SKILL.md
|
|
|
|
# 4. Package
|
|
python3 cli/package_skill.py output/godot/
|
|
|
|
# Result: godot.zip ready for Claude
|
|
# Time: 20-40 minutes (scraping) + 60 seconds (enhancement)
|
|
```
|
|
|
|
### Using Cached Data (Fast Iteration)
|
|
|
|
```bash
|
|
# 1. Use existing data + Local Enhancement
|
|
python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape
|
|
python3 cli/enhance_skill_local.py output/godot/
|
|
|
|
# 2. Package
|
|
python3 cli/package_skill.py output/godot/
|
|
|
|
# Time: 1-3 minutes (build) + 60 seconds (enhancement)
|
|
```
|
|
|
|
### Without Enhancement (Basic)
|
|
|
|
```bash
|
|
# 1. Scrape + Build (no enhancement)
|
|
python3 cli/doc_scraper.py --config configs/godot.json
|
|
|
|
# 2. Package
|
|
python3 cli/package_skill.py output/godot/
|
|
|
|
# Note: SKILL.md will be basic template - enhancement recommended
|
|
# Time: 20-40 minutes
|
|
```
|
|
|
|
### Creating a New Framework Config
|
|
|
|
**Option 1: Interactive**
|
|
```bash
|
|
python3 cli/doc_scraper.py --interactive
|
|
# Follow prompts, it creates the config for you
|
|
```
|
|
|
|
**Option 2: Copy and Modify**
|
|
```bash
|
|
# Copy a preset
|
|
cp configs/react.json configs/myframework.json
|
|
|
|
# Edit it
|
|
nano configs/myframework.json
|
|
|
|
# Test with limited pages first
|
|
# Set "max_pages": 20 in config
|
|
|
|
# Use it
|
|
python3 cli/doc_scraper.py --config configs/myframework.json
|
|
```
|
|
|
|
## Testing & Verification
|
|
|
|
### Finding the Right CSS Selectors
|
|
|
|
Before creating a config, test selectors with BeautifulSoup:
|
|
|
|
```python
|
|
from bs4 import BeautifulSoup
|
|
import requests
|
|
|
|
url = "https://docs.example.com/page"
|
|
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
|
|
|
|
# Try different selectors
|
|
print(soup.select_one('article'))
|
|
print(soup.select_one('main'))
|
|
print(soup.select_one('div[role="main"]'))
|
|
print(soup.select_one('div.content'))
|
|
|
|
# Test code block selector
|
|
print(soup.select('pre code'))
|
|
print(soup.select('pre'))
|
|
```
|
|
|
|
### Verify Output Quality
|
|
|
|
After building, verify the skill quality:
|
|
|
|
```bash
|
|
# Check SKILL.md has real examples
|
|
cat output/godot/SKILL.md
|
|
|
|
# Check category structure
|
|
cat output/godot/references/index.md
|
|
|
|
# List all reference files
|
|
ls output/godot/references/
|
|
|
|
# Check specific category content
|
|
cat output/godot/references/getting_started.md
|
|
|
|
# Verify code samples have language detection
|
|
grep -A 3 "```" output/godot/references/*.md | head -20
|
|
```
|
|
|
|
### Test with Limited Pages
|
|
|
|
For faster testing, edit config to limit pages:
|
|
|
|
```json
|
|
{
|
|
"max_pages": 20 // Test with just 20 pages
|
|
}
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### No Content Extracted
|
|
**Problem:** Pages scraped but content is empty
|
|
|
|
**Solution:** Check `main_content` selector in config. Try:
|
|
- `article`
|
|
- `main`
|
|
- `div[role="main"]`
|
|
- `div.content`
|
|
|
|
Use the BeautifulSoup testing approach above to find the right selector.
|
|
|
|
### Poor Categorization
|
|
**Problem:** Pages not categorized well
|
|
|
|
**Solution:** Edit `categories` section in config with better keywords specific to the documentation structure. Check URL patterns in scraped data:
|
|
|
|
```bash
|
|
# See what URLs were scraped
|
|
cat output/godot_data/summary.json | grep url | head -20
|
|
```
|
|
|
|
### Data Exists But Won't Use It
|
|
**Problem:** Tool won't reuse existing data
|
|
|
|
**Solution:** Force re-scrape:
|
|
```bash
|
|
rm -rf output/myframework_data/
|
|
python3 cli/doc_scraper.py --config configs/myframework.json
|
|
```
|
|
|
|
### Rate Limiting Issues
|
|
**Problem:** Getting rate limited or blocked by documentation server
|
|
|
|
**Solution:** Increase `rate_limit` value in config:
|
|
```json
|
|
{
|
|
"rate_limit": 1.0 // Change from 0.5 to 1.0 seconds
|
|
}
|
|
```
|
|
|
|
### Package Path Error
|
|
**Problem:** doc_scraper.py shows wrong cli/package_skill.py path
|
|
|
|
**Expected output:**
|
|
```bash
|
|
python3 cli/package_skill.py output/godot/
|
|
```
|
|
|
|
**Not:**
|
|
```bash
|
|
python3 /mnt/skills/examples/skill-creator/scripts/cli/package_skill.py output/godot/
|
|
```
|
|
|
|
The correct command uses the local `cli/package_skill.py` in the repository root.
|
|
|
|
## Key Code Locations
|
|
|
|
- **URL validation**: `is_valid_url()` doc_scraper.py:49-64
|
|
- **Content extraction**: `extract_content()` doc_scraper.py:66-133
|
|
- **Language detection**: `detect_language()` doc_scraper.py:135-165
|
|
- **Pattern extraction**: `extract_patterns()` doc_scraper.py:167-183
|
|
- **Smart categorization**: `smart_categorize()` doc_scraper.py:282-323
|
|
- **Category inference**: `infer_categories()` doc_scraper.py:325-351
|
|
- **Quick reference generation**: `generate_quick_reference()` doc_scraper.py:353-372
|
|
- **SKILL.md generation**: `create_enhanced_skill_md()` doc_scraper.py:426-542
|
|
- **Scraping loop**: `scrape_all()` doc_scraper.py:228-251
|
|
- **Main workflow**: `main()` doc_scraper.py:663-789
|
|
|
|
## Enhancement Details
|
|
|
|
### LOCAL Enhancement (Recommended)
|
|
- Uses your Claude Code Max plan (no API costs)
|
|
- Opens new terminal with Claude Code
|
|
- Analyzes reference files automatically
|
|
- Takes 30-60 seconds
|
|
- Quality: 9/10 (comparable to API version)
|
|
- Backs up original SKILL.md to SKILL.md.backup
|
|
|
|
### API Enhancement (Alternative)
|
|
- Uses Anthropic API (~$0.15-$0.30 per skill)
|
|
- Requires ANTHROPIC_API_KEY
|
|
- Same quality as LOCAL
|
|
- Faster (no terminal launch)
|
|
- Better for automation/CI
|
|
|
|
**What Enhancement Does:**
|
|
1. Reads reference documentation files
|
|
2. Analyzes content with Claude
|
|
3. Extracts 5-10 best code examples
|
|
4. Creates comprehensive quick reference
|
|
5. Adds domain-specific key concepts
|
|
6. Provides navigation guidance for different skill levels
|
|
7. Transforms 75-line templates into 500+ line comprehensive guides
|
|
|
|
## Performance
|
|
|
|
| Task | Time | Notes |
|
|
|------|------|-------|
|
|
| Scraping | 15-45 min | First time only |
|
|
| Building | 1-3 min | Fast! |
|
|
| Re-building | <1 min | With --skip-scrape |
|
|
| Enhancement (LOCAL) | 30-60 sec | Uses Claude Code Max |
|
|
| Enhancement (API) | 20-40 sec | Requires API key |
|
|
| Packaging | 5-10 sec | Final zip |
|
|
|
|
## Additional Documentation
|
|
|
|
- **[README.md](README.md)** - Complete user documentation
|
|
- **[QUICKSTART.md](QUICKSTART.md)** - Get started in 3 steps
|
|
- **[docs/CLAUDE.md](docs/CLAUDE.md)** - Detailed technical architecture
|
|
- **[docs/ENHANCEMENT.md](docs/ENHANCEMENT.md)** - AI enhancement guide
|
|
- **[docs/UPLOAD_GUIDE.md](docs/UPLOAD_GUIDE.md)** - How to upload skills to Claude
|
|
- **[STRUCTURE.md](STRUCTURE.md)** - Repository structure
|
|
|
|
## Notes for Claude Code
|
|
|
|
- This is a Python-based documentation scraper
|
|
- Single-file design (`doc_scraper.py` ~790 lines)
|
|
- No build system, no tests, minimal dependencies
|
|
- Output is cached and reusable
|
|
- Enhancement is optional but highly recommended
|
|
- All scraped data stored in `output/` (git-ignored)
|