This commit is contained in:
yusyus
2025-10-17 15:14:44 +00:00
parent 397d47fe7c
commit 78b9cae398
19 changed files with 3061 additions and 3 deletions

239
docs/CLAUDE.md Normal file
View File

@@ -0,0 +1,239 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Overview
This is a Python-based documentation scraper that converts ANY documentation website into a Claude skill. It's a single-file tool (`doc_scraper.py`) that scrapes documentation, extracts code patterns, detects programming languages, and generates structured skill files ready for use with Claude.
## Dependencies
```bash
pip3 install requests beautifulsoup4
```
## Core Commands
### Run with a preset configuration
```bash
python3 doc_scraper.py --config configs/godot.json
python3 doc_scraper.py --config configs/react.json
python3 doc_scraper.py --config configs/vue.json
python3 doc_scraper.py --config configs/django.json
python3 doc_scraper.py --config configs/fastapi.json
```
### Interactive mode (for new frameworks)
```bash
python3 doc_scraper.py --interactive
```
### Quick mode (minimal config)
```bash
python3 doc_scraper.py --name react --url https://react.dev/ --description "React framework"
```
### Skip scraping (use cached data)
```bash
python3 doc_scraper.py --config configs/godot.json --skip-scrape
```
### AI-powered SKILL.md enhancement
```bash
# Option 1: During scraping (API-based, requires ANTHROPIC_API_KEY)
pip3 install anthropic
export ANTHROPIC_API_KEY=sk-ant-...
python3 doc_scraper.py --config configs/react.json --enhance
# Option 2: During scraping (LOCAL, no API key - uses Claude Code Max)
python3 doc_scraper.py --config configs/react.json --enhance-local
# Option 3: Standalone after scraping (API-based)
python3 enhance_skill.py output/react/
# Option 4: Standalone after scraping (LOCAL, no API key)
python3 enhance_skill_local.py output/react/
```
The LOCAL enhancement option (`--enhance-local` or `enhance_skill_local.py`) opens a new terminal with Claude Code, which analyzes reference files and enhances SKILL.md automatically. This requires Claude Code Max plan but no API key.
### Test with limited pages (edit config first)
Set `"max_pages": 20` in the config file to test with fewer pages.
## Architecture
### Single-File Design
The entire tool is contained in `doc_scraper.py` (~737 lines). It follows a class-based architecture with a single `DocToSkillConverter` class that handles:
- **Web scraping**: BFS traversal with URL validation
- **Content extraction**: CSS selectors for title, content, code blocks
- **Language detection**: Heuristic-based detection from code samples (Python, JavaScript, GDScript, C++, etc.)
- **Pattern extraction**: Identifies common coding patterns from documentation
- **Categorization**: Smart categorization using URL structure, page titles, and content keywords with scoring
- **Skill generation**: Creates SKILL.md with real code examples and categorized reference files
### Data Flow
1. **Scrape Phase**:
- Input: Config JSON (name, base_url, selectors, url_patterns, categories, rate_limit, max_pages)
- Process: BFS traversal starting from base_url, respecting include/exclude patterns
- Output: `output/{name}_data/pages/*.json` + `summary.json`
2. **Build Phase**:
- Input: Scraped JSON data from `output/{name}_data/`
- Process: Load pages → Smart categorize → Extract patterns → Generate references
- Output: `output/{name}/SKILL.md` + `output/{name}/references/*.md`
### Directory Structure
```
doc-to-skill/
├── doc_scraper.py # Main scraping & building tool
├── enhance_skill.py # AI enhancement (API-based)
├── enhance_skill_local.py # AI enhancement (LOCAL, no API)
├── configs/ # Preset configurations
│ ├── godot.json
│ ├── react.json
│ ├── steam-inventory.json
│ └── ...
└── output/
├── {name}_data/ # Raw scraped data (cached)
│ ├── pages/ # Individual page JSONs
│ └── summary.json # Scraping summary
└── {name}/ # Generated skill
├── SKILL.md # Main skill file with examples
├── SKILL.md.backup # Backup (if enhanced)
├── references/ # Categorized documentation
│ ├── index.md
│ ├── getting_started.md
│ ├── api.md
│ └── ...
├── scripts/ # Empty (for user scripts)
└── assets/ # Empty (for user assets)
```
### Configuration Format
Config files in `configs/*.json` contain:
- `name`: Skill identifier (e.g., "godot", "react")
- `description`: When to use this skill
- `base_url`: Starting URL for scraping
- `selectors`: CSS selectors for content extraction
- `main_content`: Main documentation content (e.g., "article", "div[role='main']")
- `title`: Page title selector
- `code_blocks`: Code sample selector (e.g., "pre code", "pre")
- `url_patterns`: URL filtering
- `include`: Only scrape URLs containing these patterns
- `exclude`: Skip URLs containing these patterns
- `categories`: Keyword-based categorization mapping
- `rate_limit`: Delay between requests (seconds)
- `max_pages`: Maximum pages to scrape
### Key Features
**Auto-detect existing data**: Tool checks for `output/{name}_data/` and prompts to reuse, avoiding re-scraping.
**Language detection**: Detects code languages from:
1. CSS class attributes (`language-*`, `lang-*`)
2. Heuristics (keywords like `def`, `const`, `func`, etc.)
**Pattern extraction**: Looks for "Example:", "Pattern:", "Usage:" markers in content and extracts following code blocks (up to 5 per page).
**Smart categorization**:
- Scores pages against category keywords (3 points for URL match, 2 for title, 1 for content)
- Threshold of 2+ for categorization
- Auto-infers categories from URL segments if none provided
- Falls back to "other" category
**Enhanced SKILL.md**: Generated with:
- Real code examples from documentation (language-annotated)
- Quick reference patterns extracted from docs
- Common pattern section
- Category file listings
**AI-Powered Enhancement**: Two scripts to dramatically improve SKILL.md quality:
- `enhance_skill.py`: Uses Anthropic API (~$0.15-$0.30 per skill, requires API key)
- `enhance_skill_local.py`: Uses Claude Code Max (free, no API key needed)
- Transforms generic 75-line templates into comprehensive 500+ line guides
- Extracts best examples, explains key concepts, adds navigation guidance
- Success rate: 9/10 quality (based on steam-economy test)
## Key Code Locations
- **URL validation**: `is_valid_url()` doc_scraper.py:47-62
- **Content extraction**: `extract_content()` doc_scraper.py:64-131
- **Language detection**: `detect_language()` doc_scraper.py:133-163
- **Pattern extraction**: `extract_patterns()` doc_scraper.py:165-181
- **Smart categorization**: `smart_categorize()` doc_scraper.py:280-321
- **Category inference**: `infer_categories()` doc_scraper.py:323-349
- **Quick reference generation**: `generate_quick_reference()` doc_scraper.py:351-370
- **SKILL.md generation**: `create_enhanced_skill_md()` doc_scraper.py:424-540
- **Scraping loop**: `scrape_all()` doc_scraper.py:226-249
- **Main workflow**: `main()` doc_scraper.py:661-733
## Workflow Examples
### First time scraping (with scraping)
```bash
# 1. Scrape + Build
python3 doc_scraper.py --config configs/godot.json
# Time: 20-40 minutes
# 2. Package (assuming skill-creator is available)
python3 package_skill.py output/godot/
# Result: godot.zip
```
### Using cached data (fast iteration)
```bash
# 1. Use existing data
python3 doc_scraper.py --config configs/godot.json --skip-scrape
# Time: 1-3 minutes
# 2. Package
python3 package_skill.py output/godot/
```
### Creating a new framework config
```bash
# Option 1: Interactive
python3 doc_scraper.py --interactive
# Option 2: Copy and modify
cp configs/react.json configs/myframework.json
# Edit configs/myframework.json
python3 doc_scraper.py --config configs/myframework.json
```
## Testing Selectors
To find the right CSS selectors for a documentation site:
```python
from bs4 import BeautifulSoup
import requests
url = "https://docs.example.com/page"
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
# Try different selectors
print(soup.select_one('article'))
print(soup.select_one('main'))
print(soup.select_one('div[role="main"]'))
```
## Troubleshooting
**No content extracted**: Check `main_content` selector. Common values: `article`, `main`, `div[role="main"]`, `div.content`
**Poor categorization**: Edit `categories` section in config with better keywords specific to the documentation structure
**Force re-scrape**: Delete cached data with `rm -rf output/{name}_data/`
**Rate limiting issues**: Increase `rate_limit` value in config (e.g., from 0.5 to 1.0 seconds)
## Output Quality Checks
After building, verify quality:
```bash
cat output/godot/SKILL.md # Should have real code examples
cat output/godot/references/index.md # Should show categories
ls output/godot/references/ # Should have category .md files
```

250
docs/ENHANCEMENT.md Normal file
View File

@@ -0,0 +1,250 @@
# AI-Powered SKILL.md Enhancement
Two scripts are available to dramatically improve your SKILL.md file:
1. **`enhance_skill_local.py`** - Uses Claude Code Max (no API key, **recommended**)
2. **`enhance_skill.py`** - Uses Anthropic API (~$0.15-$0.30 per skill)
Both analyze reference documentation and extract the best examples and guidance.
## Why Use Enhancement?
**Problem:** The auto-generated SKILL.md is often too generic:
- Empty Quick Reference section
- No practical code examples
- Generic "When to Use" triggers
- Doesn't highlight key features
**Solution:** Let Claude read your reference docs and create a much better SKILL.md with:
- ✅ Best code examples extracted from documentation
- ✅ Practical quick reference with real patterns
- ✅ Domain-specific guidance
- ✅ Clear navigation tips
- ✅ Key concepts explained
## Quick Start (LOCAL - No API Key)
**Recommended for Claude Code Max users:**
```bash
# Option 1: Standalone enhancement
python3 enhance_skill_local.py output/steam-inventory/
# Option 2: Integrated with scraper
python3 doc_scraper.py --config configs/steam-inventory.json --enhance-local
```
**What happens:**
1. Opens new terminal window
2. Runs Claude Code with enhancement prompt
3. Claude analyzes reference files (~15-20K chars)
4. Generates enhanced SKILL.md (30-60 seconds)
5. Terminal auto-closes when done
**Requirements:**
- Claude Code Max plan (you're already using it!)
- macOS (auto-launch works) or manual terminal run on other OS
## API-Based Enhancement (Alternative)
**If you prefer API-based approach:**
### Installation
```bash
pip3 install anthropic
```
### Setup API Key
```bash
# Option 1: Environment variable (recommended)
export ANTHROPIC_API_KEY=sk-ant-...
# Option 2: Pass directly with --api-key
python3 enhance_skill.py output/react/ --api-key sk-ant-...
```
### Usage
```bash
# Standalone enhancement
python3 enhance_skill.py output/steam-inventory/
# Integrated with scraper
python3 doc_scraper.py --config configs/steam-inventory.json --enhance
# Dry run (see what would be done)
python3 enhance_skill.py output/react/ --dry-run
```
## What It Does
1. **Reads reference files** (api_reference.md, webapi.md, etc.)
2. **Sends to Claude** with instructions to:
- Extract 5-10 best code examples
- Create practical quick reference
- Write domain-specific "When to Use" triggers
- Add helpful navigation guidance
3. **Backs up original** SKILL.md to SKILL.md.backup
4. **Saves enhanced version** as new SKILL.md
## Example Enhancement
### Before (Auto-Generated)
```markdown
## Quick Reference
### Common Patterns
*Quick reference patterns will be added as you use the skill.*
```
### After (AI-Enhanced)
```markdown
## Quick Reference
### Common API Patterns
**Granting promotional items:**
```cpp
void CInventory::GrantPromoItems()
{
SteamItemDef_t newItems[2];
newItems[0] = 110;
newItems[1] = 111;
SteamInventory()->AddPromoItems( &s_GenerateRequestResult, newItems, 2 );
}
```
**Getting all items in player inventory:**
```cpp
SteamInventoryResult_t resultHandle;
bool success = SteamInventory()->GetAllItems( &resultHandle );
```
[... 8 more practical examples ...]
```
## Cost Estimate
- **Input**: ~50,000-100,000 tokens (reference docs)
- **Output**: ~4,000 tokens (enhanced SKILL.md)
- **Model**: claude-sonnet-4-20250514
- **Estimated cost**: $0.15-$0.30 per skill
## Troubleshooting
### "No API key provided"
```bash
export ANTHROPIC_API_KEY=sk-ant-...
# or
python3 enhance_skill.py output/react/ --api-key sk-ant-...
```
### "No reference files found"
Make sure you've run the scraper first:
```bash
python3 doc_scraper.py --config configs/react.json
```
### "anthropic package not installed"
```bash
pip3 install anthropic
```
### Don't like the result?
```bash
# Restore original
mv output/steam-inventory/SKILL.md.backup output/steam-inventory/SKILL.md
# Try again (it may generate different content)
python3 enhance_skill.py output/steam-inventory/
```
## Tips
1. **Run after scraping completes** - Enhancement works best with complete reference docs
2. **Review the output** - AI is good but not perfect, check the generated SKILL.md
3. **Keep the backup** - Original is saved as SKILL.md.backup
4. **Re-run if needed** - Each run may produce slightly different results
5. **Works offline after first run** - Reference files are local
## Real-World Results
**Test Case: steam-economy skill**
- **Before:** 75 lines, generic template, empty Quick Reference
- **After:** 570 lines, 10 practical API examples, key concepts explained
- **Time:** 60 seconds
- **Quality Rating:** 9/10
The LOCAL enhancement successfully:
- Extracted best HTTP/JSON examples from 24 pages of documentation
- Explained domain concepts (Asset Classes, Context IDs, Transaction Lifecycle)
- Created navigation guidance for beginners through advanced users
- Added best practices for security, economy design, and API integration
## Limitations
**LOCAL Enhancement (`enhance_skill_local.py`):**
- Requires Claude Code Max plan
- macOS auto-launch only (manual on other OS)
- Opens new terminal window
- Takes ~60 seconds
**API Enhancement (`enhance_skill.py`):**
- Requires Anthropic API key (paid)
- Cost: ~$0.15-$0.30 per skill
- Limited to ~100K tokens of reference input
**Both:**
- May occasionally miss the best examples
- Can't understand context beyond the reference docs
- Doesn't modify reference files (only SKILL.md)
## Enhancement Options Comparison
| Aspect | Manual Edit | LOCAL Enhancement | API Enhancement |
|--------|-------------|-------------------|-----------------|
| Time | 15-30 minutes | 30-60 seconds | 30-60 seconds |
| Code examples | You pick | AI picks best | AI picks best |
| Quick reference | Write yourself | Auto-generated | Auto-generated |
| Domain guidance | Your knowledge | From docs | From docs |
| Consistency | Varies | Consistent | Consistent |
| Cost | Free (your time) | Free (Max plan) | ~$0.20 per skill |
| Setup | None | None | API key needed |
| Quality | High (if expert) | 9/10 | 9/10 |
| **Recommended?** | For experts only | ✅ **Yes** | If no Max plan |
## When to Use
**Use enhancement when:**
- You want high-quality SKILL.md quickly
- Working with large documentation (50+ pages)
- Creating skills for unfamiliar frameworks
- Need practical code examples extracted
- Want consistent quality across multiple skills
**Skip enhancement when:**
- Budget constrained (use manual editing)
- Very small documentation (<10 pages)
- You know the framework intimately
- Documentation has no code examples
## Advanced: Customization
To customize how Claude enhances the SKILL.md, edit `enhance_skill.py` and modify the `_build_enhancement_prompt()` method around line 130.
Example customization:
```python
prompt += """
ADDITIONAL REQUIREMENTS:
- Focus on security best practices
- Include performance tips
- Add troubleshooting section
"""
```
## See Also
- [README.md](../README.md) - Main documentation
- [CLAUDE.md](CLAUDE.md) - Architecture guide
- [doc_scraper.py](../doc_scraper.py) - Main scraping tool

252
docs/UPLOAD_GUIDE.md Normal file
View File

@@ -0,0 +1,252 @@
# How to Upload Skills to Claude
## Quick Answer
**You upload the `.zip` file created by `package_skill.py`**
```bash
# Create the zip file
python3 package_skill.py output/steam-economy/
# This creates: output/steam-economy.zip
# Upload this file to Claude!
```
## What's Inside the Zip?
The `.zip` file contains:
```
steam-economy.zip
├── SKILL.md ← Main skill file (Claude reads this first)
└── references/ ← Reference documentation
├── index.md ← Category index
├── api_reference.md ← API docs
├── pricing.md ← Pricing docs
├── trading.md ← Trading docs
└── ... ← Other categorized docs
```
**Note:** The zip only includes what Claude needs. It excludes:
- `.backup` files
- Build artifacts
- Temporary files
## What Does package_skill.py Do?
The package script:
1. **Finds your skill directory** (e.g., `output/steam-economy/`)
2. **Validates SKILL.md exists** (required!)
3. **Creates a .zip file** with the same name
4. **Includes all files** except backups
5. **Saves to** `output/` directory
**Example:**
```bash
python3 package_skill.py output/steam-economy/
📦 Packaging skill: steam-economy
Source: output/steam-economy
Output: output/steam-economy.zip
+ SKILL.md
+ references/api_reference.md
+ references/pricing.md
+ references/trading.md
+ ...
✅ Package created: output/steam-economy.zip
Size: 14,290 bytes (14.0 KB)
```
## Complete Workflow
### Step 1: Scrape & Build
```bash
python3 doc_scraper.py --config configs/steam-economy.json
```
**Output:**
- `output/steam-economy_data/` (raw scraped data)
- `output/steam-economy/` (skill directory)
### Step 2: Enhance (Recommended)
```bash
python3 enhance_skill_local.py output/steam-economy/
```
**What it does:**
- Analyzes reference files
- Creates comprehensive SKILL.md
- Backs up original to SKILL.md.backup
**Output:**
- `output/steam-economy/SKILL.md` (enhanced)
- `output/steam-economy/SKILL.md.backup` (original)
### Step 3: Package
```bash
python3 package_skill.py output/steam-economy/
```
**Output:**
- `output/steam-economy.zip`**THIS IS WHAT YOU UPLOAD**
### Step 4: Upload to Claude
1. Go to Claude (claude.ai)
2. Click "Add Skill" or skill upload button
3. Select `output/steam-economy.zip`
4. Done!
## What Files Are Required?
**Minimum required structure:**
```
your-skill/
└── SKILL.md ← Required! Claude reads this first
```
**Recommended structure:**
```
your-skill/
├── SKILL.md ← Main skill file (required)
└── references/ ← Reference docs (highly recommended)
├── index.md
└── *.md ← Category files
```
**Optional (can add manually):**
```
your-skill/
├── SKILL.md
├── references/
├── scripts/ ← Helper scripts
│ └── *.py
└── assets/ ← Templates, examples
└── *.txt
```
## File Size Limits
The package script shows size after packaging:
```
✅ Package created: output/steam-economy.zip
Size: 14,290 bytes (14.0 KB)
```
**Typical sizes:**
- Small skill: 5-20 KB
- Medium skill: 20-100 KB
- Large skill: 100-500 KB
Claude has generous size limits, so most documentation-based skills fit easily.
## Quick Reference
### Package a Skill
```bash
python3 package_skill.py output/steam-economy/
```
### Package Multiple Skills
```bash
# Package all skills in output/
for dir in output/*/; do
if [ -f "$dir/SKILL.md" ]; then
python3 package_skill.py "$dir"
fi
done
```
### Check What's in a Zip
```bash
unzip -l output/steam-economy.zip
```
### Test a Packaged Skill Locally
```bash
# Extract to temp directory
mkdir temp-test
unzip output/steam-economy.zip -d temp-test/
cat temp-test/SKILL.md
```
## Troubleshooting
### "SKILL.md not found"
```bash
# Make sure you scraped and built first
python3 doc_scraper.py --config configs/steam-economy.json
# Then package
python3 package_skill.py output/steam-economy/
```
### "Directory not found"
```bash
# Check what skills are available
ls output/
# Use correct path
python3 package_skill.py output/YOUR-SKILL-NAME/
```
### Zip is Too Large
Most skills are small, but if yours is large:
```bash
# Check size
ls -lh output/steam-economy.zip
# If needed, check what's taking space
unzip -l output/steam-economy.zip | sort -k1 -rn | head -20
```
Reference files are usually small. Large sizes often mean:
- Many images (skills typically don't need images)
- Large code examples (these are fine, just be aware)
## What Does Claude Do With the Zip?
When you upload a skill zip:
1. **Claude extracts it**
2. **Reads SKILL.md first** - This tells Claude:
- When to activate this skill
- What the skill does
- Quick reference examples
- How to navigate the references
3. **Indexes reference files** - Claude can search through:
- `references/*.md` files
- Find specific APIs, examples, concepts
4. **Activates automatically** - When you ask about topics matching the skill
## Example: Using the Packaged Skill
After uploading `steam-economy.zip`:
**You ask:** "How do I implement microtransactions in my Steam game?"
**Claude:**
- Recognizes this matches steam-economy skill
- Reads SKILL.md for quick reference
- Searches references/microtransactions.md
- Provides detailed answer with code examples
## Summary
**What you need to do:**
1. ✅ Scrape: `python3 doc_scraper.py --config configs/YOUR-CONFIG.json`
2. ✅ Enhance: `python3 enhance_skill_local.py output/YOUR-SKILL/`
3. ✅ Package: `python3 package_skill.py output/YOUR-SKILL/`
4. ✅ Upload: Upload the `.zip` file to Claude
**What you upload:**
- The `.zip` file from `output/` directory
- Example: `output/steam-economy.zip`
**What's in the zip:**
- `SKILL.md` (required)
- `references/*.md` (recommended)
- Any scripts/assets you added (optional)
That's it! 🚀