From 3144d3cf3a7fc2be0c3f3964681eea37d3a75833 Mon Sep 17 00:00:00 2001 From: yusyus Date: Sun, 19 Oct 2025 13:34:02 +0300 Subject: [PATCH] Add comprehensive usage guide for all tools and workflows MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add docs/USAGE.md (~650 lines) - Complete command reference for all tools - Full help output for doc_scraper.py, estimate_pages.py, run_tests.py - Usage examples for enhancement and packaging tools - All 6 preset configs documented with details - 6 common workflows from quick start to advanced - Troubleshooting section with solutions - Advanced usage: custom selectors, URL patterns, categories - Performance tips and best practices - Exit codes, environment variables, file locations Tools covered: - doc_scraper.py (main tool with all options) - estimate_pages.py (page count estimator) - enhance_skill.py (API enhancement) - enhance_skill_local.py (local enhancement) - package_skill.py (skill packager) - run_tests.py (test runner) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- docs/USAGE.md | 811 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 811 insertions(+) create mode 100644 docs/USAGE.md diff --git a/docs/USAGE.md b/docs/USAGE.md new file mode 100644 index 0000000..b5d7374 --- /dev/null +++ b/docs/USAGE.md @@ -0,0 +1,811 @@ +# Complete Usage Guide for Skill Seeker + +Comprehensive reference for all commands, options, and workflows. + +## Table of Contents + +- [Quick Reference](#quick-reference) +- [Main Tool: doc_scraper.py](#main-tool-doc_scraperpy) +- [Estimator: estimate_pages.py](#estimator-estimate_pagespy) +- [Enhancement Tools](#enhancement-tools) +- [Packaging Tool](#packaging-tool) +- [Testing Tools](#testing-tools) +- [Available Configs](#available-configs) +- [Common Workflows](#common-workflows) +- [Troubleshooting](#troubleshooting) + +--- + +## Quick Reference + +```bash +# 1. Estimate pages (fast, 1-2 min) +python3 estimate_pages.py configs/react.json + +# 2. Scrape documentation (20-40 min) +python3 doc_scraper.py --config configs/react.json + +# 3. Enhance with Claude Code (60 sec) +python3 enhance_skill_local.py output/react/ + +# 4. Package to .zip (instant) +python3 package_skill.py output/react/ + +# 5. Test everything (1 sec) +python3 run_tests.py +``` + +--- + +## Main Tool: doc_scraper.py + +### Full Help + +``` +usage: doc_scraper.py [-h] [--interactive] [--config CONFIG] [--name NAME] + [--url URL] [--description DESCRIPTION] [--skip-scrape] + [--dry-run] [--enhance] [--enhance-local] + [--api-key API_KEY] + +Convert documentation websites to Claude skills + +options: + -h, --help Show this help message and exit + --interactive, -i Interactive configuration mode + --config, -c CONFIG Load configuration from file (e.g., configs/godot.json) + --name NAME Skill name + --url URL Base documentation URL + --description, -d DESCRIPTION + Skill description + --skip-scrape Skip scraping, use existing data + --dry-run Preview what will be scraped without actually scraping + --enhance Enhance SKILL.md using Claude API after building + (requires API key) + --enhance-local Enhance SKILL.md using Claude Code in new terminal + (no API key needed) + --api-key API_KEY Anthropic API key for --enhance (or set ANTHROPIC_API_KEY) +``` + +### Usage Examples + +**1. Use Preset Config (Recommended)** +```bash +python3 doc_scraper.py --config configs/godot.json +python3 doc_scraper.py --config configs/react.json +python3 doc_scraper.py --config configs/vue.json +python3 doc_scraper.py --config configs/django.json +python3 doc_scraper.py --config configs/fastapi.json +``` + +**2. Interactive Mode** +```bash +python3 doc_scraper.py --interactive +# Wizard walks you through: +# - Skill name +# - Base URL +# - Description +# - Selectors (optional) +# - URL patterns (optional) +# - Rate limit +# - Max pages +``` + +**3. Quick Mode (Minimal)** +```bash +python3 doc_scraper.py \ + --name react \ + --url https://react.dev/ \ + --description "React framework for building UIs" +``` + +**4. Dry-Run (Preview)** +```bash +python3 doc_scraper.py --config configs/react.json --dry-run +# Shows what will be scraped without downloading data +# No directories created +# Fast validation +``` + +**5. Skip Scraping (Use Cached Data)** +```bash +python3 doc_scraper.py --config configs/godot.json --skip-scrape +# Uses existing output/godot_data/ +# Fast rebuild (1-3 minutes) +# Useful for testing changes +``` + +**6. With Local Enhancement** +```bash +python3 doc_scraper.py --config configs/react.json --enhance-local +# Scrapes + enhances in one command +# Opens new terminal for Claude Code +# No API key needed +``` + +**7. With API Enhancement** +```bash +export ANTHROPIC_API_KEY=sk-ant-... +python3 doc_scraper.py --config configs/react.json --enhance + +# Or with inline API key: +python3 doc_scraper.py --config configs/react.json --enhance --api-key sk-ant-... +``` + +### Output Structure + +``` +output/ +├── {name}_data/ # Scraped raw data (cached) +│ ├── pages/ +│ │ ├── page_0.json +│ │ ├── page_1.json +│ │ └── ... +│ └── summary.json # Scraping stats +│ +└── {name}/ # Built skill directory + ├── SKILL.md # Main skill file + ├── SKILL.md.backup # Backup (if enhanced) + ├── references/ # Categorized docs + │ ├── index.md + │ ├── getting_started.md + │ ├── api.md + │ └── ... + ├── scripts/ # Empty (user scripts) + └── assets/ # Empty (user assets) +``` + +--- + +## Estimator: estimate_pages.py + +### Full Help + +``` +usage: estimate_pages.py [-h] [--max-discovery MAX_DISCOVERY] + [--timeout TIMEOUT] + config + +Estimate page count for Skill Seeker configs + +positional arguments: + config Path to config JSON file + +options: + -h, --help Show this help message and exit + --max-discovery, -m MAX_DISCOVERY + Maximum pages to discover (default: 1000) + --timeout, -t TIMEOUT + HTTP request timeout in seconds (default: 30) +``` + +### Usage Examples + +**1. Quick Estimate (100 pages)** +```bash +python3 estimate_pages.py configs/react.json --max-discovery 100 +# Time: ~30-60 seconds +# Good for: Quick validation +``` + +**2. Standard Estimate (1000 pages - default)** +```bash +python3 estimate_pages.py configs/godot.json +# Time: ~1-2 minutes +# Good for: Most use cases +``` + +**3. Deep Estimate (2000 pages)** +```bash +python3 estimate_pages.py configs/vue.json --max-discovery 2000 +# Time: ~3-5 minutes +# Good for: Large documentation sites +``` + +**4. Custom Timeout** +```bash +python3 estimate_pages.py configs/django.json --timeout 60 +# Useful for slow servers +``` + +### Output Example + +``` +🔍 Estimating pages for: react +📍 Base URL: https://react.dev/ +🎯 Start URLs: 6 +⏱️ Rate limit: 0.5s +🔢 Max discovery: 1000 + +⏳ Discovered: 180 pages (1.3 pages/sec) + +====================================================================== +📊 ESTIMATION RESULTS +====================================================================== + +Config: react +Base URL: https://react.dev/ + +✅ Pages Discovered: 180 +⏳ Pages Pending: 50 +📈 Estimated Total: 230 + +⏱️ Time Elapsed: 140.5s +⚡ Discovery Rate: 1.28 pages/sec + +====================================================================== +💡 RECOMMENDATIONS +====================================================================== + +✅ Current max_pages (300) is sufficient + +⏱️ Estimated full scrape time: 1.9 minutes + (Based on rate_limit: 0.5s) +``` + +**What It Shows:** +- Estimated total pages to scrape +- Whether current `max_pages` is sufficient +- Recommended `max_pages` value +- Estimated scraping time +- Discovery rate (pages/sec) + +--- + +## Enhancement Tools + +### enhance_skill_local.py (Recommended) + +**No API key needed - uses Claude Code Max plan** + +```bash +# Usage +python3 enhance_skill_local.py output/react/ +python3 enhance_skill_local.py output/godot/ + +# What it does: +# 1. Reads SKILL.md and references/ +# 2. Opens new terminal with Claude Code +# 3. Claude enhances SKILL.md +# 4. Backs up original to SKILL.md.backup +# 5. Saves enhanced version + +# Time: ~60 seconds +# Cost: Free (uses your Claude Code Max plan) +``` + +### enhance_skill.py (Alternative) + +**Requires Anthropic API key** + +```bash +# Install dependency first +pip3 install anthropic + +# Usage with environment variable +export ANTHROPIC_API_KEY=sk-ant-... +python3 enhance_skill.py output/react/ + +# Usage with inline API key +python3 enhance_skill.py output/godot/ --api-key sk-ant-... + +# What it does: +# 1. Reads SKILL.md and references/ +# 2. Calls Claude API (Sonnet 4) +# 3. Enhances SKILL.md +# 4. Backs up original to SKILL.md.backup +# 5. Saves enhanced version + +# Time: ~30-60 seconds +# Cost: ~$0.01-0.10 per skill (depending on size) +``` + +--- + +## Packaging Tool + +### package_skill.py + +```bash +# Usage +python3 package_skill.py output/react/ +python3 package_skill.py output/godot/ + +# What it does: +# 1. Validates SKILL.md exists +# 2. Creates .zip with all skill files +# 3. Saves to output/{name}.zip + +# Output: +# output/react.zip +# output/godot.zip + +# Time: Instant +``` + +--- + +## Testing Tools + +### run_tests.py + +```bash +# Run all tests (default) +python3 run_tests.py +# 71 tests, ~1 second + +# Verbose output +python3 run_tests.py -v +python3 run_tests.py --verbose + +# Quiet output +python3 run_tests.py -q +python3 run_tests.py --quiet + +# Stop on first failure +python3 run_tests.py -f +python3 run_tests.py --failfast + +# Run specific test suite +python3 run_tests.py --suite config +python3 run_tests.py --suite features +python3 run_tests.py --suite integration + +# List all tests +python3 run_tests.py --list +``` + +### Individual Tests + +```bash +# Run single test file +python3 -m unittest tests.test_config_validation +python3 -m unittest tests.test_scraper_features +python3 -m unittest tests.test_integration + +# Run single test class +python3 -m unittest tests.test_config_validation.TestConfigValidation + +# Run single test method +python3 -m unittest tests.test_config_validation.TestConfigValidation.test_valid_complete_config +``` + +--- + +## Available Configs + +### Preset Configs (Ready to Use) + +| Config | Framework | Pages | Description | +|--------|-----------|-------|-------------| +| `godot.json` | Godot Engine | ~500 | Game engine documentation | +| `react.json` | React | ~300 | React framework docs | +| `vue.json` | Vue.js | ~250 | Vue.js framework docs | +| `django.json` | Django | ~400 | Django web framework | +| `fastapi.json` | FastAPI | ~200 | FastAPI Python framework | +| `steam-economy-complete.json` | Steam | ~100 | Steam Economy API docs | + +### View Config Details + +```bash +# List all configs +ls configs/ + +# View config content +cat configs/react.json +python3 -m json.tool configs/godot.json +``` + +### Config Structure + +```json +{ + "name": "react", + "base_url": "https://react.dev/", + "description": "React - JavaScript library for building UIs", + "start_urls": [ + "https://react.dev/learn", + "https://react.dev/reference/react", + "https://react.dev/reference/react-dom" + ], + "selectors": { + "main_content": "article", + "title": "h1", + "code_blocks": "pre code" + }, + "url_patterns": { + "include": ["/learn/", "/reference/"], + "exclude": ["/blog/", "/community/"] + }, + "categories": { + "getting_started": ["learn", "tutorial", "intro"], + "api": ["reference", "api", "hooks"], + "guides": ["guide"] + }, + "rate_limit": 0.5, + "max_pages": 300 +} +``` + +--- + +## Common Workflows + +### Workflow 1: Use Preset (Fastest) + +```bash +# 1. Estimate (optional, 1-2 min) +python3 estimate_pages.py configs/react.json + +# 2. Scrape with local enhancement (25 min) +python3 doc_scraper.py --config configs/react.json --enhance-local + +# 3. Package (instant) +python3 package_skill.py output/react/ + +# Result: output/react.zip +# Upload to Claude! +``` + +### Workflow 2: Custom Documentation + +```bash +# 1. Create config +cat > configs/my-docs.json << 'EOF' +{ + "name": "my-docs", + "base_url": "https://docs.example.com/", + "description": "My documentation site", + "rate_limit": 0.5, + "max_pages": 200 +} +EOF + +# 2. Estimate +python3 estimate_pages.py configs/my-docs.json + +# 3. Dry-run test +python3 doc_scraper.py --config configs/my-docs.json --dry-run + +# 4. Full scrape +python3 doc_scraper.py --config configs/my-docs.json + +# 5. Enhance +python3 enhance_skill_local.py output/my-docs/ + +# 6. Package +python3 package_skill.py output/my-docs/ +``` + +### Workflow 3: Interactive Mode + +```bash +# 1. Start interactive wizard +python3 doc_scraper.py --interactive + +# 2. Answer prompts: +# - Name: my-framework +# - URL: https://framework.dev/ +# - Description: My favorite framework +# - Selectors: (uses defaults) +# - Rate limit: 0.5 +# - Max pages: 100 + +# 3. Enhance +python3 enhance_skill_local.py output/my-framework/ + +# 4. Package +python3 package_skill.py output/my-framework/ +``` + +### Workflow 4: Quick Mode + +```bash +python3 doc_scraper.py \ + --name vue \ + --url https://vuejs.org/ \ + --description "Vue.js framework" \ + --enhance-local +``` + +### Workflow 5: Rebuild from Cache + +```bash +# Already scraped once? +# Skip re-scraping, just rebuild +python3 doc_scraper.py --config configs/godot.json --skip-scrape + +# Try new enhancement +python3 enhance_skill_local.py output/godot/ + +# Re-package +python3 package_skill.py output/godot/ +``` + +### Workflow 6: Testing New Config + +```bash +# 1. Create test config with low max_pages +cat > configs/test.json << 'EOF' +{ + "name": "test-site", + "base_url": "https://docs.test.com/", + "max_pages": 20, + "rate_limit": 0.1 +} +EOF + +# 2. Estimate +python3 estimate_pages.py configs/test.json --max-discovery 50 + +# 3. Dry-run +python3 doc_scraper.py --config configs/test.json --dry-run + +# 4. Small scrape +python3 doc_scraper.py --config configs/test.json + +# 5. Validate output +ls output/test-site/ +ls output/test-site/references/ + +# 6. If good, increase max_pages and re-run +``` + +--- + +## Troubleshooting + +### Issue: "Rate limit exceeded" + +```bash +# Increase rate_limit in config +# Default: 0.5 seconds +# Conservative: 1.0 seconds +# Very conservative: 2.0 seconds + +# Edit config: +{ + "rate_limit": 1.0 +} +``` + +### Issue: "Too many pages" + +```bash +# Estimate first +python3 estimate_pages.py configs/my-config.json + +# Set max_pages based on estimate +# Add buffer: estimated + 50 + +# Edit config: +{ + "max_pages": 350 # for 300 estimated +} +``` + +### Issue: "No content extracted" + +```bash +# Wrong selectors +# Test selectors manually: +curl -s https://docs.example.com/ | grep -i 'article\|main\|content' + +# Common selectors: +"main_content": "article" +"main_content": "main" +"main_content": ".content" +"main_content": "#main-content" +"main_content": "div[role=\"main\"]" + +# Update config with correct selector +``` + +### Issue: "Tests failing" + +```bash +# Run specific failing test +python3 -m unittest tests.test_config_validation.TestConfigValidation.test_name -v + +# Check error message +# Verify expectations match implementation +``` + +### Issue: "Enhancement fails" + +```bash +# Local enhancement: +# Make sure Claude Code is running +# Check terminal output + +# API enhancement: +# Verify API key is set: +echo $ANTHROPIC_API_KEY + +# Or use inline: +python3 enhance_skill.py output/react/ --api-key sk-ant-... +``` + +### Issue: "Package fails" + +```bash +# Verify SKILL.md exists +ls output/my-skill/SKILL.md + +# If missing, build first: +python3 doc_scraper.py --config configs/my-skill.json --skip-scrape +``` + +### Issue: "Can't find output" + +```bash +# Check output directory +ls output/ + +# Skill data (cached): +ls output/{name}_data/ + +# Built skill: +ls output/{name}/ + +# Packaged skill: +ls output/{name}.zip +``` + +--- + +## Advanced Usage + +### Custom Selectors + +```json +{ + "selectors": { + "main_content": "div.documentation", + "title": "h1.page-title", + "code_blocks": "pre.highlight code", + "navigation": "nav.sidebar" + } +} +``` + +### URL Pattern Filtering + +```json +{ + "url_patterns": { + "include": [ + "/docs/", + "/guide/", + "/api/", + "/tutorial/" + ], + "exclude": [ + "/blog/", + "/news/", + "/community/", + "/showcase/" + ] + } +} +``` + +### Custom Categories + +```json +{ + "categories": { + "getting_started": ["intro", "tutorial", "quickstart", "installation"], + "core_concepts": ["concept", "fundamental", "architecture"], + "api": ["reference", "api", "method", "function"], + "guides": ["guide", "how-to", "example"], + "advanced": ["advanced", "expert", "performance"] + } +} +``` + +### Multiple Start URLs + +```json +{ + "start_urls": [ + "https://docs.example.com/getting-started/", + "https://docs.example.com/api/", + "https://docs.example.com/guides/", + "https://docs.example.com/examples/" + ] +} +``` + +--- + +## Performance Tips + +1. **Estimate first**: Save 20-40 minutes by validating config +2. **Use dry-run**: Test selectors before full scrape +3. **Cache data**: Use `--skip-scrape` for fast rebuilds +4. **Adjust rate_limit**: Balance speed vs politeness +5. **Set appropriate max_pages**: Don't scrape more than needed +6. **Use start_urls**: Target specific documentation sections +7. **Filter URLs**: Use include/exclude patterns +8. **Run tests**: Catch issues early + +--- + +## Environment Variables + +```bash +# Anthropic API key (for API enhancement) +export ANTHROPIC_API_KEY=sk-ant-... + +# Optional: Set custom output directory +export SKILL_SEEKER_OUTPUT_DIR=/path/to/output +``` + +--- + +## Exit Codes + +- `0`: Success +- `1`: Error (general) +- `2`: Warning (estimation hit limit) + +--- + +## File Locations + +``` +Skill_Seekers/ +├── doc_scraper.py # Main tool +├── estimate_pages.py # Estimator +├── enhance_skill.py # API enhancement +├── enhance_skill_local.py # Local enhancement +├── package_skill.py # Packager +├── run_tests.py # Test runner +├── configs/ # Preset configs +├── tests/ # Test suite +├── docs/ # Documentation +└── output/ # Generated output +``` + +--- + +## Getting Help + +```bash +# Tool-specific help +python3 doc_scraper.py --help +python3 estimate_pages.py --help +python3 run_tests.py --help + +# Documentation +cat CLAUDE.md # Quick reference for Claude Code +cat docs/CLAUDE.md # Detailed technical docs +cat docs/TESTING.md # Testing guide +cat docs/USAGE.md # This file +cat docs/ENHANCEMENT.md # Enhancement guide +cat docs/UPLOAD_GUIDE.md # Upload instructions +cat README.md # Project overview +``` + +--- + +## Summary + +**Essential Commands:** +```bash +python3 estimate_pages.py configs/react.json # Estimate +python3 doc_scraper.py --config configs/react.json # Scrape +python3 enhance_skill_local.py output/react/ # Enhance +python3 package_skill.py output/react/ # Package +python3 run_tests.py # Test +``` + +**Quick Start:** +```bash +pip3 install requests beautifulsoup4 +python3 doc_scraper.py --config configs/react.json --enhance-local +python3 package_skill.py output/react/ +# Upload output/react.zip to Claude! +``` + +Happy skill creating! 🚀