From f8c75a3b2dbebb97ac30759229e4f6b218951476 Mon Sep 17 00:00:00 2001 From: yusyus Date: Sun, 19 Oct 2025 01:43:02 +0300 Subject: [PATCH] Add comprehensive CLAUDE.md for Claude Code integration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add root-level CLAUDE.md with complete guidance for Claude Code - Include Python 3.7+ requirement - Add first-time user workflow with all commands - Include CSS selector testing with BeautifulSoup examples - Add output quality verification commands - Document force re-scrape instructions - Fix package_skill.py path (remove hardcoded /mnt/skills reference) - Add complete config file structure with real examples - Include testing section for selector validation - Add performance metrics table - Document all key code locations with line numbers - Organize by: quick start → architecture → workflows → troubleshooting - Preserve existing docs/CLAUDE.md as detailed technical reference 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- CLAUDE.md | 493 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 493 insertions(+) create mode 100644 CLAUDE.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..6b9729e --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,493 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Overview + +Skill Seeker automatically converts any documentation website into a Claude AI skill. It scrapes documentation, organizes content, extracts code patterns, and packages everything into an uploadable `.zip` file for Claude. + +## Prerequisites + +**Python Version:** Python 3.7 or higher + +**Required Dependencies:** +```bash +pip3 install requests beautifulsoup4 +``` + +**Optional (for API-based enhancement):** +```bash +pip3 install anthropic +export ANTHROPIC_API_KEY=sk-ant-... +``` + +## Core Commands + +### Quick Start - Use a Preset + +```bash +# Scrape and build with a preset configuration +python3 doc_scraper.py --config configs/godot.json +python3 doc_scraper.py --config configs/react.json +python3 doc_scraper.py --config configs/vue.json +python3 doc_scraper.py --config configs/django.json +python3 doc_scraper.py --config configs/fastapi.json +``` + +### First-Time User Workflow (Recommended) + +```bash +# 1. Install dependencies (one-time) +pip3 install requests beautifulsoup4 + +# 2. Scrape with local enhancement (uses Claude Code Max, no API key) +python3 doc_scraper.py --config configs/godot.json --enhance-local +# Time: 20-40 minutes scraping + 60 seconds enhancement + +# 3. Package the skill +python3 package_skill.py output/godot/ + +# Result: godot.zip ready to upload to Claude +``` + +### Interactive Mode + +```bash +# Step-by-step configuration wizard +python3 doc_scraper.py --interactive +``` + +### Quick Mode (Minimal Config) + +```bash +# Create skill from any documentation URL +python3 doc_scraper.py --name react --url https://react.dev/ --description "React framework for UIs" +``` + +### Skip Scraping (Use Cached Data) + +```bash +# Fast rebuild using previously scraped data +python3 doc_scraper.py --config configs/godot.json --skip-scrape +# Time: 1-3 minutes (instant rebuild) +``` + +### Enhancement Options + +**LOCAL Enhancement (Recommended - No API Key Required):** +```bash +# During scraping +python3 doc_scraper.py --config configs/react.json --enhance-local + +# Standalone after scraping +python3 enhance_skill_local.py output/react/ +``` + +**API Enhancement (Alternative - Requires API Key):** +```bash +# During scraping +python3 doc_scraper.py --config configs/react.json --enhance + +# Standalone after scraping +python3 enhance_skill.py output/react/ +python3 enhance_skill.py output/react/ --api-key sk-ant-... +``` + +### Package the Skill + +```bash +# Package skill directory into .zip file +python3 package_skill.py output/godot/ +# Result: output/godot.zip +``` + +### Force Re-scrape + +```bash +# Delete cached data and re-scrape from scratch +rm -rf output/godot_data/ +python3 doc_scraper.py --config configs/godot.json +``` + +## Repository Architecture + +### File Structure + +``` +Skill_Seekers/ +├── doc_scraper.py # Main tool (single-file, ~790 lines) +├── enhance_skill.py # AI enhancement (API-based) +├── enhance_skill_local.py # AI enhancement (LOCAL, no API) +├── package_skill.py # Skill packager +├── configs/ # Preset configurations +│ ├── godot.json +│ ├── react.json +│ ├── vue.json +│ ├── django.json +│ ├── fastapi.json +│ └── steam-economy-complete.json +├── docs/ # Documentation +│ ├── CLAUDE.md # Detailed technical architecture +│ ├── ENHANCEMENT.md # Enhancement guide +│ └── UPLOAD_GUIDE.md # How to upload skills +└── output/ # Generated output (git-ignored) + ├── {name}_data/ # Scraped raw data (cached) + │ ├── pages/*.json # Individual page data + │ └── summary.json # Scraping summary + └── {name}/ # Built skill directory + ├── SKILL.md # Main skill file + ├── SKILL.md.backup # Backup (if enhanced) + ├── references/ # Categorized documentation + │ ├── index.md + │ ├── getting_started.md + │ ├── api.md + │ └── ... + ├── scripts/ # Empty (user scripts) + └── assets/ # Empty (user assets) +``` + +### Data Flow + +1. **Scrape Phase** (`scrape_all()` in doc_scraper.py:228-251): + - Input: Config JSON (name, base_url, selectors, url_patterns, categories) + - Process: BFS traversal from base_url, respecting include/exclude patterns + - Output: `output/{name}_data/pages/*.json` + `summary.json` + +2. **Build Phase** (`build_skill()` in doc_scraper.py:561-601): + - Input: Scraped JSON data from `output/{name}_data/` + - Process: Load pages → Smart categorize → Extract patterns → Generate references + - Output: `output/{name}/SKILL.md` + `output/{name}/references/*.md` + +3. **Enhancement Phase** (optional): + - Input: Built skill directory with references + - Process: Claude analyzes references and rewrites SKILL.md + - Output: Enhanced SKILL.md with real examples and guidance + +4. **Package Phase**: + - Input: Skill directory + - Process: Zip all files (excluding .backup) + - Output: `{name}.zip` + +### Configuration File Structure + +Config files (`configs/*.json`) define scraping behavior: + +```json +{ + "name": "godot", + "description": "When to use this skill", + "base_url": "https://docs.godotengine.org/en/stable/", + "selectors": { + "main_content": "div[role='main']", + "title": "title", + "code_blocks": "pre" + }, + "url_patterns": { + "include": [], + "exclude": ["/search.html", "/_static/"] + }, + "categories": { + "getting_started": ["introduction", "getting_started"], + "scripting": ["scripting", "gdscript"], + "api": ["api", "reference", "class"] + }, + "rate_limit": 0.5, + "max_pages": 500 +} +``` + +**Config Parameters:** +- `name`: Skill identifier (output directory name) +- `description`: When Claude should use this skill +- `base_url`: Starting URL for scraping +- `selectors.main_content`: CSS selector for main content (common: `article`, `main`, `div[role="main"]`) +- `selectors.title`: CSS selector for page title +- `selectors.code_blocks`: CSS selector for code samples +- `url_patterns.include`: Only scrape URLs containing these patterns +- `url_patterns.exclude`: Skip URLs containing these patterns +- `categories`: Keyword mapping for categorization +- `rate_limit`: Delay between requests (seconds) +- `max_pages`: Maximum pages to scrape + +## Key Features & Implementation + +### Auto-Detect Existing Data +Tool checks for `output/{name}_data/` and prompts to reuse, avoiding re-scraping (check_existing_data() in doc_scraper.py:653-660). + +### Language Detection +Detects code languages from: +1. CSS class attributes (`language-*`, `lang-*`) +2. Heuristics (keywords like `def`, `const`, `func`, etc.) + +See: `detect_language()` in doc_scraper.py:135-165 + +### Pattern Extraction +Looks for "Example:", "Pattern:", "Usage:" markers in content and extracts following code blocks (up to 5 per page). + +See: `extract_patterns()` in doc_scraper.py:167-183 + +### Smart Categorization +- Scores pages against category keywords (3 points for URL match, 2 for title, 1 for content) +- Threshold of 2+ for categorization +- Auto-infers categories from URL segments if none provided +- Falls back to "other" category + +See: `smart_categorize()` and `infer_categories()` in doc_scraper.py:282-351 + +### Enhanced SKILL.md Generation +Generated with: +- Real code examples from documentation (language-annotated) +- Quick reference patterns extracted from docs +- Common pattern section +- Category file listings + +See: `create_enhanced_skill_md()` in doc_scraper.py:426-542 + +## Common Workflows + +### First Time (With Scraping + Enhancement) + +```bash +# 1. Scrape + Build + AI Enhancement (LOCAL, no API key) +python3 doc_scraper.py --config configs/godot.json --enhance-local + +# 2. Wait for enhancement terminal to close (~60 seconds) + +# 3. Verify quality +cat output/godot/SKILL.md + +# 4. Package +python3 package_skill.py output/godot/ + +# Result: godot.zip ready for Claude +# Time: 20-40 minutes (scraping) + 60 seconds (enhancement) +``` + +### Using Cached Data (Fast Iteration) + +```bash +# 1. Use existing data + Local Enhancement +python3 doc_scraper.py --config configs/godot.json --skip-scrape +python3 enhance_skill_local.py output/godot/ + +# 2. Package +python3 package_skill.py output/godot/ + +# Time: 1-3 minutes (build) + 60 seconds (enhancement) +``` + +### Without Enhancement (Basic) + +```bash +# 1. Scrape + Build (no enhancement) +python3 doc_scraper.py --config configs/godot.json + +# 2. Package +python3 package_skill.py output/godot/ + +# Note: SKILL.md will be basic template - enhancement recommended +# Time: 20-40 minutes +``` + +### Creating a New Framework Config + +**Option 1: Interactive** +```bash +python3 doc_scraper.py --interactive +# Follow prompts, it creates the config for you +``` + +**Option 2: Copy and Modify** +```bash +# Copy a preset +cp configs/react.json configs/myframework.json + +# Edit it +nano configs/myframework.json + +# Test with limited pages first +# Set "max_pages": 20 in config + +# Use it +python3 doc_scraper.py --config configs/myframework.json +``` + +## Testing & Verification + +### Finding the Right CSS Selectors + +Before creating a config, test selectors with BeautifulSoup: + +```python +from bs4 import BeautifulSoup +import requests + +url = "https://docs.example.com/page" +soup = BeautifulSoup(requests.get(url).content, 'html.parser') + +# Try different selectors +print(soup.select_one('article')) +print(soup.select_one('main')) +print(soup.select_one('div[role="main"]')) +print(soup.select_one('div.content')) + +# Test code block selector +print(soup.select('pre code')) +print(soup.select('pre')) +``` + +### Verify Output Quality + +After building, verify the skill quality: + +```bash +# Check SKILL.md has real examples +cat output/godot/SKILL.md + +# Check category structure +cat output/godot/references/index.md + +# List all reference files +ls output/godot/references/ + +# Check specific category content +cat output/godot/references/getting_started.md + +# Verify code samples have language detection +grep -A 3 "```" output/godot/references/*.md | head -20 +``` + +### Test with Limited Pages + +For faster testing, edit config to limit pages: + +```json +{ + "max_pages": 20 // Test with just 20 pages +} +``` + +## Troubleshooting + +### No Content Extracted +**Problem:** Pages scraped but content is empty + +**Solution:** Check `main_content` selector in config. Try: +- `article` +- `main` +- `div[role="main"]` +- `div.content` + +Use the BeautifulSoup testing approach above to find the right selector. + +### Poor Categorization +**Problem:** Pages not categorized well + +**Solution:** Edit `categories` section in config with better keywords specific to the documentation structure. Check URL patterns in scraped data: + +```bash +# See what URLs were scraped +cat output/godot_data/summary.json | grep url | head -20 +``` + +### Data Exists But Won't Use It +**Problem:** Tool won't reuse existing data + +**Solution:** Force re-scrape: +```bash +rm -rf output/myframework_data/ +python3 doc_scraper.py --config configs/myframework.json +``` + +### Rate Limiting Issues +**Problem:** Getting rate limited or blocked by documentation server + +**Solution:** Increase `rate_limit` value in config: +```json +{ + "rate_limit": 1.0 // Change from 0.5 to 1.0 seconds +} +``` + +### Package Path Error +**Problem:** doc_scraper.py shows wrong package_skill.py path + +**Expected output:** +```bash +python3 package_skill.py output/godot/ +``` + +**Not:** +```bash +python3 /mnt/skills/examples/skill-creator/scripts/package_skill.py output/godot/ +``` + +The correct command uses the local `package_skill.py` in the repository root. + +## Key Code Locations + +- **URL validation**: `is_valid_url()` doc_scraper.py:49-64 +- **Content extraction**: `extract_content()` doc_scraper.py:66-133 +- **Language detection**: `detect_language()` doc_scraper.py:135-165 +- **Pattern extraction**: `extract_patterns()` doc_scraper.py:167-183 +- **Smart categorization**: `smart_categorize()` doc_scraper.py:282-323 +- **Category inference**: `infer_categories()` doc_scraper.py:325-351 +- **Quick reference generation**: `generate_quick_reference()` doc_scraper.py:353-372 +- **SKILL.md generation**: `create_enhanced_skill_md()` doc_scraper.py:426-542 +- **Scraping loop**: `scrape_all()` doc_scraper.py:228-251 +- **Main workflow**: `main()` doc_scraper.py:663-789 + +## Enhancement Details + +### LOCAL Enhancement (Recommended) +- Uses your Claude Code Max plan (no API costs) +- Opens new terminal with Claude Code +- Analyzes reference files automatically +- Takes 30-60 seconds +- Quality: 9/10 (comparable to API version) +- Backs up original SKILL.md to SKILL.md.backup + +### API Enhancement (Alternative) +- Uses Anthropic API (~$0.15-$0.30 per skill) +- Requires ANTHROPIC_API_KEY +- Same quality as LOCAL +- Faster (no terminal launch) +- Better for automation/CI + +**What Enhancement Does:** +1. Reads reference documentation files +2. Analyzes content with Claude +3. Extracts 5-10 best code examples +4. Creates comprehensive quick reference +5. Adds domain-specific key concepts +6. Provides navigation guidance for different skill levels +7. Transforms 75-line templates into 500+ line comprehensive guides + +## Performance + +| Task | Time | Notes | +|------|------|-------| +| Scraping | 15-45 min | First time only | +| Building | 1-3 min | Fast! | +| Re-building | <1 min | With --skip-scrape | +| Enhancement (LOCAL) | 30-60 sec | Uses Claude Code Max | +| Enhancement (API) | 20-40 sec | Requires API key | +| Packaging | 5-10 sec | Final zip | + +## Additional Documentation + +- **[README.md](README.md)** - Complete user documentation +- **[QUICKSTART.md](QUICKSTART.md)** - Get started in 3 steps +- **[docs/CLAUDE.md](docs/CLAUDE.md)** - Detailed technical architecture +- **[docs/ENHANCEMENT.md](docs/ENHANCEMENT.md)** - AI enhancement guide +- **[docs/UPLOAD_GUIDE.md](docs/UPLOAD_GUIDE.md)** - How to upload skills to Claude +- **[STRUCTURE.md](STRUCTURE.md)** - Repository structure + +## Notes for Claude Code + +- This is a Python-based documentation scraper +- Single-file design (`doc_scraper.py` ~790 lines) +- No build system, no tests, minimal dependencies +- Output is cached and reusable +- Enhancement is optional but highly recommended +- All scraped data stored in `output/` (git-ignored)