Add comprehensive CLAUDE.md for Claude Code integration
- Add root-level CLAUDE.md with complete guidance for Claude Code - Include Python 3.7+ requirement - Add first-time user workflow with all commands - Include CSS selector testing with BeautifulSoup examples - Add output quality verification commands - Document force re-scrape instructions - Fix package_skill.py path (remove hardcoded /mnt/skills reference) - Add complete config file structure with real examples - Include testing section for selector validation - Add performance metrics table - Document all key code locations with line numbers - Organize by: quick start → architecture → workflows → troubleshooting - Preserve existing docs/CLAUDE.md as detailed technical reference 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
493
CLAUDE.md
Normal file
493
CLAUDE.md
Normal file
@@ -0,0 +1,493 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Overview
|
||||
|
||||
Skill Seeker automatically converts any documentation website into a Claude AI skill. It scrapes documentation, organizes content, extracts code patterns, and packages everything into an uploadable `.zip` file for Claude.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
**Python Version:** Python 3.7 or higher
|
||||
|
||||
**Required Dependencies:**
|
||||
```bash
|
||||
pip3 install requests beautifulsoup4
|
||||
```
|
||||
|
||||
**Optional (for API-based enhancement):**
|
||||
```bash
|
||||
pip3 install anthropic
|
||||
export ANTHROPIC_API_KEY=sk-ant-...
|
||||
```
|
||||
|
||||
## Core Commands
|
||||
|
||||
### Quick Start - Use a Preset
|
||||
|
||||
```bash
|
||||
# Scrape and build with a preset configuration
|
||||
python3 doc_scraper.py --config configs/godot.json
|
||||
python3 doc_scraper.py --config configs/react.json
|
||||
python3 doc_scraper.py --config configs/vue.json
|
||||
python3 doc_scraper.py --config configs/django.json
|
||||
python3 doc_scraper.py --config configs/fastapi.json
|
||||
```
|
||||
|
||||
### First-Time User Workflow (Recommended)
|
||||
|
||||
```bash
|
||||
# 1. Install dependencies (one-time)
|
||||
pip3 install requests beautifulsoup4
|
||||
|
||||
# 2. Scrape with local enhancement (uses Claude Code Max, no API key)
|
||||
python3 doc_scraper.py --config configs/godot.json --enhance-local
|
||||
# Time: 20-40 minutes scraping + 60 seconds enhancement
|
||||
|
||||
# 3. Package the skill
|
||||
python3 package_skill.py output/godot/
|
||||
|
||||
# Result: godot.zip ready to upload to Claude
|
||||
```
|
||||
|
||||
### Interactive Mode
|
||||
|
||||
```bash
|
||||
# Step-by-step configuration wizard
|
||||
python3 doc_scraper.py --interactive
|
||||
```
|
||||
|
||||
### Quick Mode (Minimal Config)
|
||||
|
||||
```bash
|
||||
# Create skill from any documentation URL
|
||||
python3 doc_scraper.py --name react --url https://react.dev/ --description "React framework for UIs"
|
||||
```
|
||||
|
||||
### Skip Scraping (Use Cached Data)
|
||||
|
||||
```bash
|
||||
# Fast rebuild using previously scraped data
|
||||
python3 doc_scraper.py --config configs/godot.json --skip-scrape
|
||||
# Time: 1-3 minutes (instant rebuild)
|
||||
```
|
||||
|
||||
### Enhancement Options
|
||||
|
||||
**LOCAL Enhancement (Recommended - No API Key Required):**
|
||||
```bash
|
||||
# During scraping
|
||||
python3 doc_scraper.py --config configs/react.json --enhance-local
|
||||
|
||||
# Standalone after scraping
|
||||
python3 enhance_skill_local.py output/react/
|
||||
```
|
||||
|
||||
**API Enhancement (Alternative - Requires API Key):**
|
||||
```bash
|
||||
# During scraping
|
||||
python3 doc_scraper.py --config configs/react.json --enhance
|
||||
|
||||
# Standalone after scraping
|
||||
python3 enhance_skill.py output/react/
|
||||
python3 enhance_skill.py output/react/ --api-key sk-ant-...
|
||||
```
|
||||
|
||||
### Package the Skill
|
||||
|
||||
```bash
|
||||
# Package skill directory into .zip file
|
||||
python3 package_skill.py output/godot/
|
||||
# Result: output/godot.zip
|
||||
```
|
||||
|
||||
### Force Re-scrape
|
||||
|
||||
```bash
|
||||
# Delete cached data and re-scrape from scratch
|
||||
rm -rf output/godot_data/
|
||||
python3 doc_scraper.py --config configs/godot.json
|
||||
```
|
||||
|
||||
## Repository Architecture
|
||||
|
||||
### File Structure
|
||||
|
||||
```
|
||||
Skill_Seekers/
|
||||
├── doc_scraper.py # Main tool (single-file, ~790 lines)
|
||||
├── enhance_skill.py # AI enhancement (API-based)
|
||||
├── enhance_skill_local.py # AI enhancement (LOCAL, no API)
|
||||
├── package_skill.py # Skill packager
|
||||
├── configs/ # Preset configurations
|
||||
│ ├── godot.json
|
||||
│ ├── react.json
|
||||
│ ├── vue.json
|
||||
│ ├── django.json
|
||||
│ ├── fastapi.json
|
||||
│ └── steam-economy-complete.json
|
||||
├── docs/ # Documentation
|
||||
│ ├── CLAUDE.md # Detailed technical architecture
|
||||
│ ├── ENHANCEMENT.md # Enhancement guide
|
||||
│ └── UPLOAD_GUIDE.md # How to upload skills
|
||||
└── output/ # Generated output (git-ignored)
|
||||
├── {name}_data/ # Scraped raw data (cached)
|
||||
│ ├── pages/*.json # Individual page data
|
||||
│ └── summary.json # Scraping summary
|
||||
└── {name}/ # Built skill directory
|
||||
├── SKILL.md # Main skill file
|
||||
├── SKILL.md.backup # Backup (if enhanced)
|
||||
├── references/ # Categorized documentation
|
||||
│ ├── index.md
|
||||
│ ├── getting_started.md
|
||||
│ ├── api.md
|
||||
│ └── ...
|
||||
├── scripts/ # Empty (user scripts)
|
||||
└── assets/ # Empty (user assets)
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
1. **Scrape Phase** (`scrape_all()` in doc_scraper.py:228-251):
|
||||
- Input: Config JSON (name, base_url, selectors, url_patterns, categories)
|
||||
- Process: BFS traversal from base_url, respecting include/exclude patterns
|
||||
- Output: `output/{name}_data/pages/*.json` + `summary.json`
|
||||
|
||||
2. **Build Phase** (`build_skill()` in doc_scraper.py:561-601):
|
||||
- Input: Scraped JSON data from `output/{name}_data/`
|
||||
- Process: Load pages → Smart categorize → Extract patterns → Generate references
|
||||
- Output: `output/{name}/SKILL.md` + `output/{name}/references/*.md`
|
||||
|
||||
3. **Enhancement Phase** (optional):
|
||||
- Input: Built skill directory with references
|
||||
- Process: Claude analyzes references and rewrites SKILL.md
|
||||
- Output: Enhanced SKILL.md with real examples and guidance
|
||||
|
||||
4. **Package Phase**:
|
||||
- Input: Skill directory
|
||||
- Process: Zip all files (excluding .backup)
|
||||
- Output: `{name}.zip`
|
||||
|
||||
### Configuration File Structure
|
||||
|
||||
Config files (`configs/*.json`) define scraping behavior:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "godot",
|
||||
"description": "When to use this skill",
|
||||
"base_url": "https://docs.godotengine.org/en/stable/",
|
||||
"selectors": {
|
||||
"main_content": "div[role='main']",
|
||||
"title": "title",
|
||||
"code_blocks": "pre"
|
||||
},
|
||||
"url_patterns": {
|
||||
"include": [],
|
||||
"exclude": ["/search.html", "/_static/"]
|
||||
},
|
||||
"categories": {
|
||||
"getting_started": ["introduction", "getting_started"],
|
||||
"scripting": ["scripting", "gdscript"],
|
||||
"api": ["api", "reference", "class"]
|
||||
},
|
||||
"rate_limit": 0.5,
|
||||
"max_pages": 500
|
||||
}
|
||||
```
|
||||
|
||||
**Config Parameters:**
|
||||
- `name`: Skill identifier (output directory name)
|
||||
- `description`: When Claude should use this skill
|
||||
- `base_url`: Starting URL for scraping
|
||||
- `selectors.main_content`: CSS selector for main content (common: `article`, `main`, `div[role="main"]`)
|
||||
- `selectors.title`: CSS selector for page title
|
||||
- `selectors.code_blocks`: CSS selector for code samples
|
||||
- `url_patterns.include`: Only scrape URLs containing these patterns
|
||||
- `url_patterns.exclude`: Skip URLs containing these patterns
|
||||
- `categories`: Keyword mapping for categorization
|
||||
- `rate_limit`: Delay between requests (seconds)
|
||||
- `max_pages`: Maximum pages to scrape
|
||||
|
||||
## Key Features & Implementation
|
||||
|
||||
### Auto-Detect Existing Data
|
||||
Tool checks for `output/{name}_data/` and prompts to reuse, avoiding re-scraping (check_existing_data() in doc_scraper.py:653-660).
|
||||
|
||||
### Language Detection
|
||||
Detects code languages from:
|
||||
1. CSS class attributes (`language-*`, `lang-*`)
|
||||
2. Heuristics (keywords like `def`, `const`, `func`, etc.)
|
||||
|
||||
See: `detect_language()` in doc_scraper.py:135-165
|
||||
|
||||
### Pattern Extraction
|
||||
Looks for "Example:", "Pattern:", "Usage:" markers in content and extracts following code blocks (up to 5 per page).
|
||||
|
||||
See: `extract_patterns()` in doc_scraper.py:167-183
|
||||
|
||||
### Smart Categorization
|
||||
- Scores pages against category keywords (3 points for URL match, 2 for title, 1 for content)
|
||||
- Threshold of 2+ for categorization
|
||||
- Auto-infers categories from URL segments if none provided
|
||||
- Falls back to "other" category
|
||||
|
||||
See: `smart_categorize()` and `infer_categories()` in doc_scraper.py:282-351
|
||||
|
||||
### Enhanced SKILL.md Generation
|
||||
Generated with:
|
||||
- Real code examples from documentation (language-annotated)
|
||||
- Quick reference patterns extracted from docs
|
||||
- Common pattern section
|
||||
- Category file listings
|
||||
|
||||
See: `create_enhanced_skill_md()` in doc_scraper.py:426-542
|
||||
|
||||
## Common Workflows
|
||||
|
||||
### First Time (With Scraping + Enhancement)
|
||||
|
||||
```bash
|
||||
# 1. Scrape + Build + AI Enhancement (LOCAL, no API key)
|
||||
python3 doc_scraper.py --config configs/godot.json --enhance-local
|
||||
|
||||
# 2. Wait for enhancement terminal to close (~60 seconds)
|
||||
|
||||
# 3. Verify quality
|
||||
cat output/godot/SKILL.md
|
||||
|
||||
# 4. Package
|
||||
python3 package_skill.py output/godot/
|
||||
|
||||
# Result: godot.zip ready for Claude
|
||||
# Time: 20-40 minutes (scraping) + 60 seconds (enhancement)
|
||||
```
|
||||
|
||||
### Using Cached Data (Fast Iteration)
|
||||
|
||||
```bash
|
||||
# 1. Use existing data + Local Enhancement
|
||||
python3 doc_scraper.py --config configs/godot.json --skip-scrape
|
||||
python3 enhance_skill_local.py output/godot/
|
||||
|
||||
# 2. Package
|
||||
python3 package_skill.py output/godot/
|
||||
|
||||
# Time: 1-3 minutes (build) + 60 seconds (enhancement)
|
||||
```
|
||||
|
||||
### Without Enhancement (Basic)
|
||||
|
||||
```bash
|
||||
# 1. Scrape + Build (no enhancement)
|
||||
python3 doc_scraper.py --config configs/godot.json
|
||||
|
||||
# 2. Package
|
||||
python3 package_skill.py output/godot/
|
||||
|
||||
# Note: SKILL.md will be basic template - enhancement recommended
|
||||
# Time: 20-40 minutes
|
||||
```
|
||||
|
||||
### Creating a New Framework Config
|
||||
|
||||
**Option 1: Interactive**
|
||||
```bash
|
||||
python3 doc_scraper.py --interactive
|
||||
# Follow prompts, it creates the config for you
|
||||
```
|
||||
|
||||
**Option 2: Copy and Modify**
|
||||
```bash
|
||||
# Copy a preset
|
||||
cp configs/react.json configs/myframework.json
|
||||
|
||||
# Edit it
|
||||
nano configs/myframework.json
|
||||
|
||||
# Test with limited pages first
|
||||
# Set "max_pages": 20 in config
|
||||
|
||||
# Use it
|
||||
python3 doc_scraper.py --config configs/myframework.json
|
||||
```
|
||||
|
||||
## Testing & Verification
|
||||
|
||||
### Finding the Right CSS Selectors
|
||||
|
||||
Before creating a config, test selectors with BeautifulSoup:
|
||||
|
||||
```python
|
||||
from bs4 import BeautifulSoup
|
||||
import requests
|
||||
|
||||
url = "https://docs.example.com/page"
|
||||
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
|
||||
|
||||
# Try different selectors
|
||||
print(soup.select_one('article'))
|
||||
print(soup.select_one('main'))
|
||||
print(soup.select_one('div[role="main"]'))
|
||||
print(soup.select_one('div.content'))
|
||||
|
||||
# Test code block selector
|
||||
print(soup.select('pre code'))
|
||||
print(soup.select('pre'))
|
||||
```
|
||||
|
||||
### Verify Output Quality
|
||||
|
||||
After building, verify the skill quality:
|
||||
|
||||
```bash
|
||||
# Check SKILL.md has real examples
|
||||
cat output/godot/SKILL.md
|
||||
|
||||
# Check category structure
|
||||
cat output/godot/references/index.md
|
||||
|
||||
# List all reference files
|
||||
ls output/godot/references/
|
||||
|
||||
# Check specific category content
|
||||
cat output/godot/references/getting_started.md
|
||||
|
||||
# Verify code samples have language detection
|
||||
grep -A 3 "```" output/godot/references/*.md | head -20
|
||||
```
|
||||
|
||||
### Test with Limited Pages
|
||||
|
||||
For faster testing, edit config to limit pages:
|
||||
|
||||
```json
|
||||
{
|
||||
"max_pages": 20 // Test with just 20 pages
|
||||
}
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### No Content Extracted
|
||||
**Problem:** Pages scraped but content is empty
|
||||
|
||||
**Solution:** Check `main_content` selector in config. Try:
|
||||
- `article`
|
||||
- `main`
|
||||
- `div[role="main"]`
|
||||
- `div.content`
|
||||
|
||||
Use the BeautifulSoup testing approach above to find the right selector.
|
||||
|
||||
### Poor Categorization
|
||||
**Problem:** Pages not categorized well
|
||||
|
||||
**Solution:** Edit `categories` section in config with better keywords specific to the documentation structure. Check URL patterns in scraped data:
|
||||
|
||||
```bash
|
||||
# See what URLs were scraped
|
||||
cat output/godot_data/summary.json | grep url | head -20
|
||||
```
|
||||
|
||||
### Data Exists But Won't Use It
|
||||
**Problem:** Tool won't reuse existing data
|
||||
|
||||
**Solution:** Force re-scrape:
|
||||
```bash
|
||||
rm -rf output/myframework_data/
|
||||
python3 doc_scraper.py --config configs/myframework.json
|
||||
```
|
||||
|
||||
### Rate Limiting Issues
|
||||
**Problem:** Getting rate limited or blocked by documentation server
|
||||
|
||||
**Solution:** Increase `rate_limit` value in config:
|
||||
```json
|
||||
{
|
||||
"rate_limit": 1.0 // Change from 0.5 to 1.0 seconds
|
||||
}
|
||||
```
|
||||
|
||||
### Package Path Error
|
||||
**Problem:** doc_scraper.py shows wrong package_skill.py path
|
||||
|
||||
**Expected output:**
|
||||
```bash
|
||||
python3 package_skill.py output/godot/
|
||||
```
|
||||
|
||||
**Not:**
|
||||
```bash
|
||||
python3 /mnt/skills/examples/skill-creator/scripts/package_skill.py output/godot/
|
||||
```
|
||||
|
||||
The correct command uses the local `package_skill.py` in the repository root.
|
||||
|
||||
## Key Code Locations
|
||||
|
||||
- **URL validation**: `is_valid_url()` doc_scraper.py:49-64
|
||||
- **Content extraction**: `extract_content()` doc_scraper.py:66-133
|
||||
- **Language detection**: `detect_language()` doc_scraper.py:135-165
|
||||
- **Pattern extraction**: `extract_patterns()` doc_scraper.py:167-183
|
||||
- **Smart categorization**: `smart_categorize()` doc_scraper.py:282-323
|
||||
- **Category inference**: `infer_categories()` doc_scraper.py:325-351
|
||||
- **Quick reference generation**: `generate_quick_reference()` doc_scraper.py:353-372
|
||||
- **SKILL.md generation**: `create_enhanced_skill_md()` doc_scraper.py:426-542
|
||||
- **Scraping loop**: `scrape_all()` doc_scraper.py:228-251
|
||||
- **Main workflow**: `main()` doc_scraper.py:663-789
|
||||
|
||||
## Enhancement Details
|
||||
|
||||
### LOCAL Enhancement (Recommended)
|
||||
- Uses your Claude Code Max plan (no API costs)
|
||||
- Opens new terminal with Claude Code
|
||||
- Analyzes reference files automatically
|
||||
- Takes 30-60 seconds
|
||||
- Quality: 9/10 (comparable to API version)
|
||||
- Backs up original SKILL.md to SKILL.md.backup
|
||||
|
||||
### API Enhancement (Alternative)
|
||||
- Uses Anthropic API (~$0.15-$0.30 per skill)
|
||||
- Requires ANTHROPIC_API_KEY
|
||||
- Same quality as LOCAL
|
||||
- Faster (no terminal launch)
|
||||
- Better for automation/CI
|
||||
|
||||
**What Enhancement Does:**
|
||||
1. Reads reference documentation files
|
||||
2. Analyzes content with Claude
|
||||
3. Extracts 5-10 best code examples
|
||||
4. Creates comprehensive quick reference
|
||||
5. Adds domain-specific key concepts
|
||||
6. Provides navigation guidance for different skill levels
|
||||
7. Transforms 75-line templates into 500+ line comprehensive guides
|
||||
|
||||
## Performance
|
||||
|
||||
| Task | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Scraping | 15-45 min | First time only |
|
||||
| Building | 1-3 min | Fast! |
|
||||
| Re-building | <1 min | With --skip-scrape |
|
||||
| Enhancement (LOCAL) | 30-60 sec | Uses Claude Code Max |
|
||||
| Enhancement (API) | 20-40 sec | Requires API key |
|
||||
| Packaging | 5-10 sec | Final zip |
|
||||
|
||||
## Additional Documentation
|
||||
|
||||
- **[README.md](README.md)** - Complete user documentation
|
||||
- **[QUICKSTART.md](QUICKSTART.md)** - Get started in 3 steps
|
||||
- **[docs/CLAUDE.md](docs/CLAUDE.md)** - Detailed technical architecture
|
||||
- **[docs/ENHANCEMENT.md](docs/ENHANCEMENT.md)** - AI enhancement guide
|
||||
- **[docs/UPLOAD_GUIDE.md](docs/UPLOAD_GUIDE.md)** - How to upload skills to Claude
|
||||
- **[STRUCTURE.md](STRUCTURE.md)** - Repository structure
|
||||
|
||||
## Notes for Claude Code
|
||||
|
||||
- This is a Python-based documentation scraper
|
||||
- Single-file design (`doc_scraper.py` ~790 lines)
|
||||
- No build system, no tests, minimal dependencies
|
||||
- Output is cached and reusable
|
||||
- Enhancement is optional but highly recommended
|
||||
- All scraped data stored in `output/` (git-ignored)
|
||||
Reference in New Issue
Block a user