Init

2025-10-17 15:14:44 +00:00
parent 397d47fe7c
commit 78b9cae398
19 changed files with 3061 additions and 3 deletions
--- a/docs/CLAUDE.md
+++ b/docs/CLAUDE.md
@@ -0,0 +1,239 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Overview
+
+This is a Python-based documentation scraper that converts ANY documentation website into a Claude skill. It's a single-file tool (`doc_scraper.py`) that scrapes documentation, extracts code patterns, detects programming languages, and generates structured skill files ready for use with Claude.
+
+## Dependencies
+
+```bash
+pip3 install requests beautifulsoup4
+```
+
+## Core Commands
+
+### Run with a preset configuration
+```bash
+python3 doc_scraper.py --config configs/godot.json
+python3 doc_scraper.py --config configs/react.json
+python3 doc_scraper.py --config configs/vue.json
+python3 doc_scraper.py --config configs/django.json
+python3 doc_scraper.py --config configs/fastapi.json
+```
+
+### Interactive mode (for new frameworks)
+```bash
+python3 doc_scraper.py --interactive
+```
+
+### Quick mode (minimal config)
+```bash
+python3 doc_scraper.py --name react --url https://react.dev/ --description "React framework"
+```
+
+### Skip scraping (use cached data)
+```bash
+python3 doc_scraper.py --config configs/godot.json --skip-scrape
+```
+
+### AI-powered SKILL.md enhancement
+```bash
+# Option 1: During scraping (API-based, requires ANTHROPIC_API_KEY)
+pip3 install anthropic
+export ANTHROPIC_API_KEY=sk-ant-...
+python3 doc_scraper.py --config configs/react.json --enhance
+
+# Option 2: During scraping (LOCAL, no API key - uses Claude Code Max)
+python3 doc_scraper.py --config configs/react.json --enhance-local
+
+# Option 3: Standalone after scraping (API-based)
+python3 enhance_skill.py output/react/
+
+# Option 4: Standalone after scraping (LOCAL, no API key)
+python3 enhance_skill_local.py output/react/
+```
+
+The LOCAL enhancement option (`--enhance-local` or `enhance_skill_local.py`) opens a new terminal with Claude Code, which analyzes reference files and enhances SKILL.md automatically. This requires Claude Code Max plan but no API key.
+
+### Test with limited pages (edit config first)
+Set `"max_pages": 20` in the config file to test with fewer pages.
+
+## Architecture
+
+### Single-File Design
+The entire tool is contained in `doc_scraper.py` (~737 lines). It follows a class-based architecture with a single `DocToSkillConverter` class that handles:
+- **Web scraping**: BFS traversal with URL validation
+- **Content extraction**: CSS selectors for title, content, code blocks
+- **Language detection**: Heuristic-based detection from code samples (Python, JavaScript, GDScript, C++, etc.)
+- **Pattern extraction**: Identifies common coding patterns from documentation
+- **Categorization**: Smart categorization using URL structure, page titles, and content keywords with scoring
+- **Skill generation**: Creates SKILL.md with real code examples and categorized reference files
+
+### Data Flow
+1. **Scrape Phase**:
+   - Input: Config JSON (name, base_url, selectors, url_patterns, categories, rate_limit, max_pages)
+   - Process: BFS traversal starting from base_url, respecting include/exclude patterns
+   - Output: `output/{name}_data/pages/*.json` + `summary.json`
+
+2. **Build Phase**:
+   - Input: Scraped JSON data from `output/{name}_data/`
+   - Process: Load pages → Smart categorize → Extract patterns → Generate references
+   - Output: `output/{name}/SKILL.md` + `output/{name}/references/*.md`
+
+### Directory Structure
+```
+doc-to-skill/
+├── doc_scraper.py             # Main scraping & building tool
+├── enhance_skill.py           # AI enhancement (API-based)
+├── enhance_skill_local.py     # AI enhancement (LOCAL, no API)
+├── configs/                   # Preset configurations
+│   ├── godot.json
+│   ├── react.json
+│   ├── steam-inventory.json
+│   └── ...
+└── output/
+    ├── {name}_data/           # Raw scraped data (cached)
+    │   ├── pages/             # Individual page JSONs
+    │   └── summary.json       # Scraping summary
+    └── {name}/                # Generated skill
+        ├── SKILL.md           # Main skill file with examples
+        ├── SKILL.md.backup    # Backup (if enhanced)
+        ├── references/        # Categorized documentation
+        │   ├── index.md
+        │   ├── getting_started.md
+        │   ├── api.md
+        │   └── ...
+        ├── scripts/           # Empty (for user scripts)
+        └── assets/            # Empty (for user assets)
+```
+
+### Configuration Format
+Config files in `configs/*.json` contain:
+- `name`: Skill identifier (e.g., "godot", "react")
+- `description`: When to use this skill
+- `base_url`: Starting URL for scraping
+- `selectors`: CSS selectors for content extraction
+  - `main_content`: Main documentation content (e.g., "article", "div[role='main']")
+  - `title`: Page title selector
+  - `code_blocks`: Code sample selector (e.g., "pre code", "pre")
+- `url_patterns`: URL filtering
+  - `include`: Only scrape URLs containing these patterns
+  - `exclude`: Skip URLs containing these patterns
+- `categories`: Keyword-based categorization mapping
+- `rate_limit`: Delay between requests (seconds)
+- `max_pages`: Maximum pages to scrape
+
+### Key Features
+
+**Auto-detect existing data**: Tool checks for `output/{name}_data/` and prompts to reuse, avoiding re-scraping.
+
+**Language detection**: Detects code languages from:
+1. CSS class attributes (`language-*`, `lang-*`)
+2. Heuristics (keywords like `def`, `const`, `func`, etc.)
+
+**Pattern extraction**: Looks for "Example:", "Pattern:", "Usage:" markers in content and extracts following code blocks (up to 5 per page).
+
+**Smart categorization**:
+- Scores pages against category keywords (3 points for URL match, 2 for title, 1 for content)
+- Threshold of 2+ for categorization
+- Auto-infers categories from URL segments if none provided
+- Falls back to "other" category
+
+**Enhanced SKILL.md**: Generated with:
+- Real code examples from documentation (language-annotated)
+- Quick reference patterns extracted from docs
+- Common pattern section
+- Category file listings
+
+**AI-Powered Enhancement**: Two scripts to dramatically improve SKILL.md quality:
+- `enhance_skill.py`: Uses Anthropic API (~$0.15-$0.30 per skill, requires API key)
+- `enhance_skill_local.py`: Uses Claude Code Max (free, no API key needed)
+- Transforms generic 75-line templates into comprehensive 500+ line guides
+- Extracts best examples, explains key concepts, adds navigation guidance
+- Success rate: 9/10 quality (based on steam-economy test)
+
+## Key Code Locations
+
+- **URL validation**: `is_valid_url()` doc_scraper.py:47-62
+- **Content extraction**: `extract_content()` doc_scraper.py:64-131
+- **Language detection**: `detect_language()` doc_scraper.py:133-163
+- **Pattern extraction**: `extract_patterns()` doc_scraper.py:165-181
+- **Smart categorization**: `smart_categorize()` doc_scraper.py:280-321
+- **Category inference**: `infer_categories()` doc_scraper.py:323-349
+- **Quick reference generation**: `generate_quick_reference()` doc_scraper.py:351-370
+- **SKILL.md generation**: `create_enhanced_skill_md()` doc_scraper.py:424-540
+- **Scraping loop**: `scrape_all()` doc_scraper.py:226-249
+- **Main workflow**: `main()` doc_scraper.py:661-733
+
+## Workflow Examples
+
+### First time scraping (with scraping)
+```bash
+# 1. Scrape + Build
+python3 doc_scraper.py --config configs/godot.json
+# Time: 20-40 minutes
+
+# 2. Package (assuming skill-creator is available)
+python3 package_skill.py output/godot/
+
+# Result: godot.zip
+```
+
+### Using cached data (fast iteration)
+```bash
+# 1. Use existing data
+python3 doc_scraper.py --config configs/godot.json --skip-scrape
+# Time: 1-3 minutes
+
+# 2. Package
+python3 package_skill.py output/godot/
+```
+
+### Creating a new framework config
+```bash
+# Option 1: Interactive
+python3 doc_scraper.py --interactive
+
+# Option 2: Copy and modify
+cp configs/react.json configs/myframework.json
+# Edit configs/myframework.json
+python3 doc_scraper.py --config configs/myframework.json
+```
+
+## Testing Selectors
+
+To find the right CSS selectors for a documentation site:
+
+```python
+from bs4 import BeautifulSoup
+import requests
+
+url = "https://docs.example.com/page"
+soup = BeautifulSoup(requests.get(url).content, 'html.parser')
+
+# Try different selectors
+print(soup.select_one('article'))
+print(soup.select_one('main'))
+print(soup.select_one('div[role="main"]'))
+```
+
+## Troubleshooting
+
+**No content extracted**: Check `main_content` selector. Common values: `article`, `main`, `div[role="main"]`, `div.content`
+
+**Poor categorization**: Edit `categories` section in config with better keywords specific to the documentation structure
+
+**Force re-scrape**: Delete cached data with `rm -rf output/{name}_data/`
+
+**Rate limiting issues**: Increase `rate_limit` value in config (e.g., from 0.5 to 1.0 seconds)
+
+## Output Quality Checks
+
+After building, verify quality:
+```bash
+cat output/godot/SKILL.md              # Should have real code examples
+cat output/godot/references/index.md   # Should show categories
+ls output/godot/references/            # Should have category .md files
+```
--- a/docs/ENHANCEMENT.md
+++ b/docs/ENHANCEMENT.md
@@ -0,0 +1,250 @@
+# AI-Powered SKILL.md Enhancement
+
+Two scripts are available to dramatically improve your SKILL.md file:
+1. **`enhance_skill_local.py`** - Uses Claude Code Max (no API key, **recommended**)
+2. **`enhance_skill.py`** - Uses Anthropic API (~$0.15-$0.30 per skill)
+
+Both analyze reference documentation and extract the best examples and guidance.
+
+## Why Use Enhancement?
+
+**Problem:** The auto-generated SKILL.md is often too generic:
+- Empty Quick Reference section
+- No practical code examples
+- Generic "When to Use" triggers
+- Doesn't highlight key features
+
+**Solution:** Let Claude read your reference docs and create a much better SKILL.md with:
+- ✅ Best code examples extracted from documentation
+- ✅ Practical quick reference with real patterns
+- ✅ Domain-specific guidance
+- ✅ Clear navigation tips
+- ✅ Key concepts explained
+
+## Quick Start (LOCAL - No API Key)
+
+**Recommended for Claude Code Max users:**
+
+```bash
+# Option 1: Standalone enhancement
+python3 enhance_skill_local.py output/steam-inventory/
+
+# Option 2: Integrated with scraper
+python3 doc_scraper.py --config configs/steam-inventory.json --enhance-local
+```
+
+**What happens:**
+1. Opens new terminal window
+2. Runs Claude Code with enhancement prompt
+3. Claude analyzes reference files (~15-20K chars)
+4. Generates enhanced SKILL.md (30-60 seconds)
+5. Terminal auto-closes when done
+
+**Requirements:**
+- Claude Code Max plan (you're already using it!)
+- macOS (auto-launch works) or manual terminal run on other OS
+
+## API-Based Enhancement (Alternative)
+
+**If you prefer API-based approach:**
+
+### Installation
+
+```bash
+pip3 install anthropic
+```
+
+### Setup API Key
+
+```bash
+# Option 1: Environment variable (recommended)
+export ANTHROPIC_API_KEY=sk-ant-...
+
+# Option 2: Pass directly with --api-key
+python3 enhance_skill.py output/react/ --api-key sk-ant-...
+```
+
+### Usage
+
+```bash
+# Standalone enhancement
+python3 enhance_skill.py output/steam-inventory/
+
+# Integrated with scraper
+python3 doc_scraper.py --config configs/steam-inventory.json --enhance
+
+# Dry run (see what would be done)
+python3 enhance_skill.py output/react/ --dry-run
+```
+
+## What It Does
+
+1. **Reads reference files** (api_reference.md, webapi.md, etc.)
+2. **Sends to Claude** with instructions to:
+   - Extract 5-10 best code examples
+   - Create practical quick reference
+   - Write domain-specific "When to Use" triggers
+   - Add helpful navigation guidance
+3. **Backs up original** SKILL.md to SKILL.md.backup
+4. **Saves enhanced version** as new SKILL.md
+
+## Example Enhancement
+
+### Before (Auto-Generated)
+```markdown
+## Quick Reference
+
+### Common Patterns
+
+*Quick reference patterns will be added as you use the skill.*
+```
+
+### After (AI-Enhanced)
+```markdown
+## Quick Reference
+
+### Common API Patterns
+
+**Granting promotional items:**
+```cpp
+void CInventory::GrantPromoItems()
+{
+    SteamItemDef_t newItems[2];
+    newItems[0] = 110;
+    newItems[1] = 111;
+    SteamInventory()->AddPromoItems( &s_GenerateRequestResult, newItems, 2 );
+}
+```
+
+**Getting all items in player inventory:**
+```cpp
+SteamInventoryResult_t resultHandle;
+bool success = SteamInventory()->GetAllItems( &resultHandle );
+```
+[... 8 more practical examples ...]
+```
+
+## Cost Estimate
+
+- **Input**: ~50,000-100,000 tokens (reference docs)
+- **Output**: ~4,000 tokens (enhanced SKILL.md)
+- **Model**: claude-sonnet-4-20250514
+- **Estimated cost**: $0.15-$0.30 per skill
+
+## Troubleshooting
+
+### "No API key provided"
+```bash
+export ANTHROPIC_API_KEY=sk-ant-...
+# or
+python3 enhance_skill.py output/react/ --api-key sk-ant-...
+```
+
+### "No reference files found"
+Make sure you've run the scraper first:
+```bash
+python3 doc_scraper.py --config configs/react.json
+```
+
+### "anthropic package not installed"
+```bash
+pip3 install anthropic
+```
+
+### Don't like the result?
+```bash
+# Restore original
+mv output/steam-inventory/SKILL.md.backup output/steam-inventory/SKILL.md
+
+# Try again (it may generate different content)
+python3 enhance_skill.py output/steam-inventory/
+```
+
+## Tips
+
+1. **Run after scraping completes** - Enhancement works best with complete reference docs
+2. **Review the output** - AI is good but not perfect, check the generated SKILL.md
+3. **Keep the backup** - Original is saved as SKILL.md.backup
+4. **Re-run if needed** - Each run may produce slightly different results
+5. **Works offline after first run** - Reference files are local
+
+## Real-World Results
+
+**Test Case: steam-economy skill**
+- **Before:** 75 lines, generic template, empty Quick Reference
+- **After:** 570 lines, 10 practical API examples, key concepts explained
+- **Time:** 60 seconds
+- **Quality Rating:** 9/10
+
+The LOCAL enhancement successfully:
+- Extracted best HTTP/JSON examples from 24 pages of documentation
+- Explained domain concepts (Asset Classes, Context IDs, Transaction Lifecycle)
+- Created navigation guidance for beginners through advanced users
+- Added best practices for security, economy design, and API integration
+
+## Limitations
+
+**LOCAL Enhancement (`enhance_skill_local.py`):**
+- Requires Claude Code Max plan
+- macOS auto-launch only (manual on other OS)
+- Opens new terminal window
+- Takes ~60 seconds
+
+**API Enhancement (`enhance_skill.py`):**
+- Requires Anthropic API key (paid)
+- Cost: ~$0.15-$0.30 per skill
+- Limited to ~100K tokens of reference input
+
+**Both:**
+- May occasionally miss the best examples
+- Can't understand context beyond the reference docs
+- Doesn't modify reference files (only SKILL.md)
+
+## Enhancement Options Comparison
+
+| Aspect | Manual Edit | LOCAL Enhancement | API Enhancement |
+|--------|-------------|-------------------|-----------------|
+| Time | 15-30 minutes | 30-60 seconds | 30-60 seconds |
+| Code examples | You pick | AI picks best | AI picks best |
+| Quick reference | Write yourself | Auto-generated | Auto-generated |
+| Domain guidance | Your knowledge | From docs | From docs |
+| Consistency | Varies | Consistent | Consistent |
+| Cost | Free (your time) | Free (Max plan) | ~$0.20 per skill |
+| Setup | None | None | API key needed |
+| Quality | High (if expert) | 9/10 | 9/10 |
+| **Recommended?** | For experts only | ✅ **Yes** | If no Max plan |
+
+## When to Use
+
+**Use enhancement when:**
+- You want high-quality SKILL.md quickly
+- Working with large documentation (50+ pages)
+- Creating skills for unfamiliar frameworks
+- Need practical code examples extracted
+- Want consistent quality across multiple skills
+
+**Skip enhancement when:**
+- Budget constrained (use manual editing)
+- Very small documentation (<10 pages)
+- You know the framework intimately
+- Documentation has no code examples
+
+## Advanced: Customization
+
+To customize how Claude enhances the SKILL.md, edit `enhance_skill.py` and modify the `_build_enhancement_prompt()` method around line 130.
+
+Example customization:
+```python
+prompt += """
+ADDITIONAL REQUIREMENTS:
+- Focus on security best practices
+- Include performance tips
+- Add troubleshooting section
+"""
+```
+
+## See Also
+
+- [README.md](../README.md) - Main documentation
+- [CLAUDE.md](CLAUDE.md) - Architecture guide
+- [doc_scraper.py](../doc_scraper.py) - Main scraping tool
--- a/docs/UPLOAD_GUIDE.md
+++ b/docs/UPLOAD_GUIDE.md
@@ -0,0 +1,252 @@
+# How to Upload Skills to Claude
+
+## Quick Answer
+
+**You upload the `.zip` file created by `package_skill.py`**
+
+```bash
+# Create the zip file
+python3 package_skill.py output/steam-economy/
+
+# This creates: output/steam-economy.zip
+# Upload this file to Claude!
+```
+
+## What's Inside the Zip?
+
+The `.zip` file contains:
+
+```
+steam-economy.zip
+├── SKILL.md              ← Main skill file (Claude reads this first)
+└── references/           ← Reference documentation
+    ├── index.md          ← Category index
+    ├── api_reference.md  ← API docs
+    ├── pricing.md        ← Pricing docs
+    ├── trading.md        ← Trading docs
+    └── ...               ← Other categorized docs
+```
+
+**Note:** The zip only includes what Claude needs. It excludes:
+- `.backup` files
+- Build artifacts
+- Temporary files
+
+## What Does package_skill.py Do?
+
+The package script:
+
+1. **Finds your skill directory** (e.g., `output/steam-economy/`)
+2. **Validates SKILL.md exists** (required!)
+3. **Creates a .zip file** with the same name
+4. **Includes all files** except backups
+5. **Saves to** `output/` directory
+
+**Example:**
+```bash
+python3 package_skill.py output/steam-economy/
+
+📦 Packaging skill: steam-economy
+   Source: output/steam-economy
+   Output: output/steam-economy.zip
+   + SKILL.md
+   + references/api_reference.md
+   + references/pricing.md
+   + references/trading.md
+   + ...
+
+✅ Package created: output/steam-economy.zip
+   Size: 14,290 bytes (14.0 KB)
+```
+
+## Complete Workflow
+
+### Step 1: Scrape & Build
+```bash
+python3 doc_scraper.py --config configs/steam-economy.json
+```
+
+**Output:**
+- `output/steam-economy_data/` (raw scraped data)
+- `output/steam-economy/` (skill directory)
+
+### Step 2: Enhance (Recommended)
+```bash
+python3 enhance_skill_local.py output/steam-economy/
+```
+
+**What it does:**
+- Analyzes reference files
+- Creates comprehensive SKILL.md
+- Backs up original to SKILL.md.backup
+
+**Output:**
+- `output/steam-economy/SKILL.md` (enhanced)
+- `output/steam-economy/SKILL.md.backup` (original)
+
+### Step 3: Package
+```bash
+python3 package_skill.py output/steam-economy/
+```
+
+**Output:**
+- `output/steam-economy.zip` ← **THIS IS WHAT YOU UPLOAD**
+
+### Step 4: Upload to Claude
+1. Go to Claude (claude.ai)
+2. Click "Add Skill" or skill upload button
+3. Select `output/steam-economy.zip`
+4. Done!
+
+## What Files Are Required?
+
+**Minimum required structure:**
+```
+your-skill/
+└── SKILL.md          ← Required! Claude reads this first
+```
+
+**Recommended structure:**
+```
+your-skill/
+├── SKILL.md          ← Main skill file (required)
+└── references/       ← Reference docs (highly recommended)
+    ├── index.md
+    └── *.md          ← Category files
+```
+
+**Optional (can add manually):**
+```
+your-skill/
+├── SKILL.md
+├── references/
+├── scripts/          ← Helper scripts
+│   └── *.py
+└── assets/           ← Templates, examples
+    └── *.txt
+```
+
+## File Size Limits
+
+The package script shows size after packaging:
+```
+✅ Package created: output/steam-economy.zip
+   Size: 14,290 bytes (14.0 KB)
+```
+
+**Typical sizes:**
+- Small skill: 5-20 KB
+- Medium skill: 20-100 KB
+- Large skill: 100-500 KB
+
+Claude has generous size limits, so most documentation-based skills fit easily.
+
+## Quick Reference
+
+### Package a Skill
+```bash
+python3 package_skill.py output/steam-economy/
+```
+
+### Package Multiple Skills
+```bash
+# Package all skills in output/
+for dir in output/*/; do
+  if [ -f "$dir/SKILL.md" ]; then
+    python3 package_skill.py "$dir"
+  fi
+done
+```
+
+### Check What's in a Zip
+```bash
+unzip -l output/steam-economy.zip
+```
+
+### Test a Packaged Skill Locally
+```bash
+# Extract to temp directory
+mkdir temp-test
+unzip output/steam-economy.zip -d temp-test/
+cat temp-test/SKILL.md
+```
+
+## Troubleshooting
+
+### "SKILL.md not found"
+```bash
+# Make sure you scraped and built first
+python3 doc_scraper.py --config configs/steam-economy.json
+
+# Then package
+python3 package_skill.py output/steam-economy/
+```
+
+### "Directory not found"
+```bash
+# Check what skills are available
+ls output/
+
+# Use correct path
+python3 package_skill.py output/YOUR-SKILL-NAME/
+```
+
+### Zip is Too Large
+Most skills are small, but if yours is large:
+```bash
+# Check size
+ls -lh output/steam-economy.zip
+
+# If needed, check what's taking space
+unzip -l output/steam-economy.zip | sort -k1 -rn | head -20
+```
+
+Reference files are usually small. Large sizes often mean:
+- Many images (skills typically don't need images)
+- Large code examples (these are fine, just be aware)
+
+## What Does Claude Do With the Zip?
+
+When you upload a skill zip:
+
+1. **Claude extracts it**
+2. **Reads SKILL.md first** - This tells Claude:
+   - When to activate this skill
+   - What the skill does
+   - Quick reference examples
+   - How to navigate the references
+3. **Indexes reference files** - Claude can search through:
+   - `references/*.md` files
+   - Find specific APIs, examples, concepts
+4. **Activates automatically** - When you ask about topics matching the skill
+
+## Example: Using the Packaged Skill
+
+After uploading `steam-economy.zip`:
+
+**You ask:** "How do I implement microtransactions in my Steam game?"
+
+**Claude:**
+- Recognizes this matches steam-economy skill
+- Reads SKILL.md for quick reference
+- Searches references/microtransactions.md
+- Provides detailed answer with code examples
+
+## Summary
+
+**What you need to do:**
+1. ✅ Scrape: `python3 doc_scraper.py --config configs/YOUR-CONFIG.json`
+2. ✅ Enhance: `python3 enhance_skill_local.py output/YOUR-SKILL/`
+3. ✅ Package: `python3 package_skill.py output/YOUR-SKILL/`
+4. ✅ Upload: Upload the `.zip` file to Claude
+
+**What you upload:**
+- The `.zip` file from `output/` directory
+- Example: `output/steam-economy.zip`
+
+**What's in the zip:**
+- `SKILL.md` (required)
+- `references/*.md` (recommended)
+- Any scripts/assets you added (optional)
+
+That's it! 🚀