Version Updates: - pyproject.toml: 2.5.0 → 2.5.1 - src/skill_seekers/__init__.py: 2.0.0 → 2.5.1 - src/skill_seekers/cli/__init__.py: 2.0.0 → 2.5.1 - src/skill_seekers/cli/main.py: 2.4.0 → 2.5.1 - src/skill_seekers/mcp/__init__.py: 2.4.0 → 2.5.1 - src/skill_seekers/mcp/tools/__init__.py: 2.4.0 → 2.5.1 CHANGELOG: - Added v2.5.1 release notes documenting PR #221 fix - Critical: Fixed missing skill_seekers.cli.adaptors package - Impact: Restores all multi-platform features for PyPI users Documentation: - Updated CLAUDE.md to v2.5.0 with multi-platform details - Added platform adaptor architecture documentation - Updated test architecture and environment variables Related: PR #221 (merged), Issue #222 (py.typed follow-up) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
39 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
🎯 Current Status (December 30, 2025)
Version: v2.5.0 (Production Ready - Multi-Platform Feature Parity!) Active Development: Flexible, incremental task-based approach
Recent Updates (December 2025):
🎉 MAJOR RELEASE: Multi-Platform Feature Parity (v2.5.0)
- 🌐 4 LLM Platforms: Claude AI, Google Gemini, OpenAI ChatGPT, Generic Markdown
- ✅ Complete Feature Parity: All skill modes work with all platforms
- 🔧 Platform Adaptors: Clean architecture with platform-specific implementations
- 📦 Smart Enhancement: Platform-specific AI models (Sonnet 4, Gemini 2.0, GPT-4o)
- 🧪 Test Coverage: 700+ tests passing across all platforms
- 📚 Unified Workflow: Same scraping output works for all platforms
🚀 Unified Multi-Source Scraping (v2.0.0)
- NEW: Combine documentation + GitHub + PDF in one skill
- NEW: Automatic conflict detection between docs and code
- NEW: Rule-based and AI-powered merging
- NEW: 5 example unified configs (React, Django, FastAPI, Godot, FastAPI-test)
- Status: ✅ All 22 unified tests passing (18 core + 4 MCP integration)
✅ Community Response (H1 Group):
- Issue #8 Fixed - Added BULLETPROOF_QUICKSTART.md and TROUBLESHOOTING.md for beginners
- Issue #7 Fixed - Fixed all 11 configs (Django, Laravel, Astro, Tailwind) - 100% working
- Issue #4 Linked - Connected to roadmap Tasks A2/A3 (knowledge sharing + website)
- PR #5 Reviewed - Approved anchor stripping feature (security verified, 32/32 tests pass)
- MCP Setup Fixed - Path expansion bug resolved in setup_mcp.sh
📦 Configs Status:
- ✅ 24 total configs available (including unified configs)
- ✅ 5 unified configs added (React, Django, FastAPI, Godot, FastAPI-test)
- ✅ Core selectors tested and validated
- 📝 Single-source configs: ansible-core, astro, claude-code, django, fastapi, godot, godot-large-example, hono, kubernetes, laravel, react, steam-economy-complete, tailwind, vue
- 📝 Multi-source configs: django_unified, fastapi_unified, fastapi_unified_test, godot_unified, react_unified
- 📝 Test/Example configs: godot_github, react_github, python-tutorial-test, example_pdf, test-manual
📋 Recent Completions (December 2025):
- ✅ DONE: Multi-platform support (v2.5.0) - 4 LLM platforms
- ✅ DONE: Platform adaptor architecture with clean separation
- ✅ DONE: Enhanced MCP tools with platform support (18 tools)
- ✅ DONE: Multi-platform CLI commands (package, upload, enhance)
- ✅ DONE: Test suite expanded to 700+ tests
- ✅ DONE: Complete feature parity across all platforms
📊 Roadmap Progress:
- 134 tasks organized into 22 feature groups
- Project board: https://github.com/users/yusufkaraaslan/projects/2
- See FLEXIBLE_ROADMAP.md for complete task list
🔌 MCP Integration Available
This repository includes a fully tested MCP server with 18 tools supporting 4 LLM platforms:
Core Tools (9):
list_configs- List all available preset configurationsgenerate_config- Generate new config for any docs sitevalidate_config- Validate config file structureestimate_pages- Estimate page count before scrapingscrape_docs- Scrape and build a skillpackage_skill- Package skill (supports --target: claude, gemini, openai, markdown)upload_skill- Upload to LLM platform (supports --target: claude, gemini, openai)enhance_skill- NEW! AI enhancement with platform supportinstall_skill- Complete workflow (fetch → scrape → enhance → package → upload)
Extended Tools (9):
scrape_github- Scrape GitHub repositoriesscrape_pdf- Extract from PDFsunified_scrape- Multi-source scrapingmerge_sources- Merge docs + codedetect_conflicts- Find discrepanciessplit_config- Split large configsgenerate_router- Generate router skillsadd_config_source- Register git reposfetch_config- Fetch from git
Setup: See docs/MCP_SETUP.md or run ./setup_mcp.sh
Status: ✅ Tested with 5 AI agents (Claude Code, Cursor, Windsurf, VS Code + Cline, IntelliJ IDEA)
Overview
Skill Seeker automatically converts any documentation website into a Claude AI skill. It scrapes documentation, organizes content, extracts code patterns, and packages everything into an uploadable .zip file for Claude.
Prerequisites
Python Version: Python 3.10 or higher (required for MCP integration)
Installation:
Option 1: Install from PyPI (Recommended - Easiest!)
# Install globally or in virtual environment
pip install skill-seekers
# Use the unified CLI immediately
skill-seekers scrape --config configs/react.json
skill-seekers --help
Option 2: Install from Source (For Development)
# Clone the repository
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # macOS/Linux (Windows: venv\Scripts\activate)
# Install in editable mode
pip install -e .
# Or install dependencies manually
pip install -r requirements.txt
Why use a virtual environment?
- Keeps dependencies isolated from system Python
- Prevents package version conflicts
- Standard Python development practice
- Required for running tests with pytest
Optional (for API-based enhancement):
pip install anthropic
export ANTHROPIC_API_KEY=sk-ant-...
Core Commands
Multi-Platform Support (NEW in v2.5.0)
# Package for different LLM platforms
skill-seekers package output/react/ --target claude # Default
skill-seekers package output/react/ --target gemini
skill-seekers package output/react/ --target openai
skill-seekers package output/react/ --target markdown
# Upload to platform
skill-seekers upload react-gemini.tar.gz --target gemini
skill-seekers upload react-openai.zip --target openai
# AI enhancement with platform-specific models
skill-seekers enhance output/react/ --target gemini --mode api
skill-seekers enhance output/react/ --target openai --mode api
Quick Start - Use a Preset
# Single-source scraping (documentation only)
skill-seekers scrape --config configs/godot.json
skill-seekers scrape --config configs/react.json
skill-seekers scrape --config configs/vue.json
skill-seekers scrape --config configs/django.json
skill-seekers scrape --config configs/laravel.json
skill-seekers scrape --config configs/fastapi.json
Unified Multi-Source Scraping (NEW - v2.0.0)
# Combine documentation + GitHub + PDF in one skill
skill-seekers unified --config configs/react_unified.json
skill-seekers unified --config configs/django_unified.json
skill-seekers unified --config configs/fastapi_unified.json
skill-seekers unified --config configs/godot_unified.json
# Override merge mode
skill-seekers unified --config configs/react_unified.json --merge-mode claude-enhanced
# Result: One comprehensive skill with conflict detection
What makes it special:
- ✅ Detects discrepancies between documentation and code
- ✅ Shows both versions side-by-side with ⚠️ warnings
- ✅ Identifies outdated docs and undocumented features
- ✅ Single source of truth showing intent (docs) AND reality (code)
See full guide: docs/UNIFIED_SCRAPING.md
First-Time User Workflow (Recommended)
# 1. Install from PyPI (one-time, easiest!)
pip install skill-seekers
# 2. Estimate page count BEFORE scraping (fast, no data download)
skill-seekers estimate configs/godot.json
# Time: ~1-2 minutes, shows estimated total pages and recommended max_pages
# 3. Scrape with local enhancement (uses Claude Code Max, no API key)
skill-seekers scrape --config configs/godot.json --enhance-local
# Time: 20-40 minutes scraping + 60 seconds enhancement
# 4. Package the skill
skill-seekers package output/godot/
# Result: godot.zip ready to upload to Claude
NEW! One-Command Install Workflow (v2.1.1)
The fastest way to install a skill - complete automation from config to uploaded skill:
# Install React skill from official configs (auto-uploads to Claude)
skill-seekers install --config react
# Time: 20-45 minutes total (scraping 20-40 min + enhancement 60 sec + upload 5 sec)
# Install from local config file
skill-seekers install --config configs/custom.json
# Install without uploading (package only)
skill-seekers install --config django --no-upload
# Unlimited scraping (no page limits - WARNING: can take hours)
skill-seekers install --config godot --unlimited
# Preview workflow without executing
skill-seekers install --config react --dry-run
# Custom output directory
skill-seekers install --config vue --destination /tmp/skills
What it does automatically:
- ✅ Fetches config from API (if config name provided)
- ✅ Scrapes documentation
- ✅ AI Enhancement (MANDATORY) - 30-60 sec, quality boost from 3/10 → 9/10
- ✅ Packages skill to .zip
- ✅ Uploads to Claude (if ANTHROPIC_API_KEY set)
Why use this:
- Zero friction - One command instead of 5 separate steps
- Quality guaranteed - Enhancement is mandatory, ensures professional output
- Complete automation - From config name to uploaded skill
- Time savings - Fully automated workflow
Phases executed:
📥 PHASE 1: Fetch Config (if config name provided)
📖 PHASE 2: Scrape Documentation
✨ PHASE 3: AI Enhancement (MANDATORY - no skip option)
📦 PHASE 4: Package Skill
☁️ PHASE 5: Upload to Claude (optional)
Interactive Mode
# Step-by-step configuration wizard
skill-seekers scrape --interactive
Quick Mode (Minimal Config)
# Create skill from any documentation URL
skill-seekers scrape --name react --url https://react.dev/ --description "React framework for UIs"
Skip Scraping (Use Cached Data)
# Fast rebuild using previously scraped data
skill-seekers scrape --config configs/godot.json --skip-scrape
# Time: 1-3 minutes (instant rebuild)
Async Mode (2-3x Faster Scraping)
# Enable async mode with 8 workers for best performance
skill-seekers scrape --config configs/react.json --async --workers 8
# Quick mode with async
skill-seekers scrape --name react --url https://react.dev/ --async --workers 8
# Dry run with async to test
skill-seekers scrape --config configs/godot.json --async --workers 4 --dry-run
Recommended Settings:
- Small docs (~100-500 pages):
--async --workers 4 - Medium docs (~500-2000 pages):
--async --workers 8 - Large docs (2000+ pages):
--async --workers 8 --no-rate-limit
Performance:
- Sync: ~18 pages/sec, 120 MB memory
- Async: ~55 pages/sec, 40 MB memory (3x faster!)
See full guide: ASYNC_SUPPORT.md
Enhancement Options
LOCAL Enhancement (Recommended - No API Key Required):
# During scraping
skill-seekers scrape --config configs/react.json --enhance-local
# Standalone after scraping
skill-seekers enhance output/react/
API Enhancement (Alternative - Requires API Key):
# During scraping
skill-seekers scrape --config configs/react.json --enhance
# Standalone after scraping
skill-seekers-enhance output/react/
skill-seekers-enhance output/react/ --api-key sk-ant-...
Package and Upload the Skill
# Package skill (opens folder, shows upload instructions)
skill-seekers package output/godot/
# Result: output/godot.zip
# Package and auto-upload (requires ANTHROPIC_API_KEY)
export ANTHROPIC_API_KEY=sk-ant-...
skill-seekers package output/godot/ --upload
# Upload existing .zip
skill-seekers upload output/godot.zip
# Package without opening folder
skill-seekers package output/godot/ --no-open
Install to AI Agents
# Single agent installation
skill-seekers install-agent output/godot/ --agent cursor
# Install to all agents
skill-seekers install-agent output/godot/ --agent all
# Force overwrite
skill-seekers install-agent output/godot/ --agent claude --force
# Dry run (preview only)
skill-seekers install-agent output/godot/ --agent cursor --dry-run
Supported agents: claude, cursor, vscode, copilot, amp, goose, opencode, letta, aide, windsurf, all
Installation paths:
- Global agents (claude, amp, goose, etc.): Install to
~/.{agent}/skills/ - Project agents (cursor, vscode): Install to
.{agent}/skills/in current directory
Force Re-scrape
# Delete cached data and re-scrape from scratch
rm -rf output/godot_data/
skill-seekers scrape --config configs/godot.json
Estimate Page Count (Before Scraping)
# Quick estimation - discover up to 100 pages
skill-seekers estimate configs/react.json --max-discovery 100
# Time: ~30-60 seconds
# Full estimation - discover up to 1000 pages (default)
skill-seekers estimate configs/godot.json
# Time: ~1-2 minutes
# Deep estimation - discover up to 2000 pages
skill-seekers estimate configs/vue.json --max-discovery 2000
# Time: ~3-5 minutes
# What it shows:
# - Estimated total pages
# - Recommended max_pages value
# - Estimated scraping time
# - Discovery rate (pages/sec)
Why use estimation:
- Validates config URL patterns before full scrape
- Helps set optimal
max_pagesvalue - Estimates total scraping time
- Fast (only HEAD requests + minimal parsing)
- No data downloaded or stored
Repository Architecture
File Structure (v2.5.0 - Multi-Platform Architecture)
Skill_Seekers/
├── pyproject.toml # Modern Python package configuration (PEP 621)
├── src/ # Source code (src/ layout best practice)
│ └── skill_seekers/
│ ├── __init__.py
│ ├── cli/ # CLI tools (entry points)
│ │ ├── main.py # Unified CLI dispatcher (Git-style)
│ │ ├── doc_scraper.py # Main scraper (~790 lines)
│ │ ├── estimate_pages.py # Page count estimator
│ │ ├── enhance_skill_local.py # AI enhancement (local)
│ │ ├── package_skill.py # Skill packager
│ │ ├── upload_skill.py # Upload to platforms
│ │ ├── install_skill.py # Complete workflow automation
│ │ ├── install_agent.py # Install to AI agent directories
│ │ ├── github_scraper.py # GitHub scraper
│ │ ├── pdf_scraper.py # PDF scraper
│ │ ├── unified_scraper.py # Unified multi-source scraper
│ │ ├── merge_sources.py # Source merger
│ │ ├── conflict_detector.py # Conflict detection
│ │ └── adaptors/ # Platform adaptor architecture
│ │ ├── __init__.py # Factory: get_adaptor(target)
│ │ ├── base_adaptor.py # Abstract base class
│ │ ├── claude_adaptor.py # Claude AI implementation
│ │ ├── gemini_adaptor.py # Google Gemini implementation
│ │ ├── openai_adaptor.py # OpenAI ChatGPT implementation
│ │ └── markdown_adaptor.py # Generic Markdown export
│ └── mcp/ # MCP server integration
│ ├── server.py # FastMCP-based server (stdio + HTTP)
│ └── tools/ # MCP tool implementations
├── tests/ # Test suite (700+ tests passing)
│ ├── test_scraper_features.py
│ ├── test_config_validation.py
│ ├── test_integration.py
│ ├── test_mcp_server.py
│ ├── test_mcp_fastmcp.py # FastMCP framework tests
│ ├── test_unified.py # Unified scraping tests
│ ├── test_install_multiplatform.py # Multi-platform tests
│ └── ... (40+ test files)
├── configs/ # Preset configurations (24 configs)
│ ├── godot.json
│ ├── react.json
│ ├── django_unified.json # Multi-source configs
│ └── ...
├── docs/ # Documentation
│ ├── CLAUDE.md # This file
│ ├── ENHANCEMENT.md # Enhancement guide
│ ├── UPLOAD_GUIDE.md # Upload instructions
│ └── UNIFIED_SCRAPING.md # Unified scraping guide
├── README.md # User documentation
├── CHANGELOG.md # Release history
├── FUTURE_RELEASES.md # Roadmap
└── output/ # Generated output (git-ignored)
├── {name}_data/ # Scraped raw data (cached)
│ ├── pages/*.json # Individual page data
│ └── summary.json # Scraping summary
└── {name}/ # Built skill directory
├── SKILL.md # Main skill file
├── SKILL.md.backup # Backup (if enhanced)
├── references/ # Categorized documentation
│ ├── index.md
│ ├── getting_started.md
│ ├── api.md
│ └── ...
├── scripts/ # Empty (user scripts)
└── assets/ # Empty (user assets)
Key Changes in v2.5.0:
- Platform Adaptor Architecture: Clean separation for Claude, Gemini, OpenAI, Markdown
- Multi-platform CLI:
--targetflag on package/upload/enhance commands - 18 MCP Tools: Extended from 9 to 18 tools with platform support
- src/ layout: Modern Python packaging structure
- pyproject.toml: PEP 621 compliant with optional platform dependencies
- Entry points:
skill-seekersCLI with Git-style subcommands - Published to PyPI:
pip install skill-seekers+ platform extras
Platform Adaptor Architecture (NEW in v2.5.0)
Design Pattern: Strategy pattern with factory method for platform-specific implementations
Key Components:
-
BaseAdaptor (
src/skill_seekers/cli/adaptors/base_adaptor.py): Abstract base class defining interfacepackage(skill_dir, output_path)- Package skill in platform-specific formatupload(package_path, api_key)- Upload to platform APIenhance(skill_dir, mode)- AI enhancement using platform-specific model
-
Factory Function (
src/skill_seekers/cli/adaptors/__init__.py):get_adaptor(target: str) -> BaseAdaptor- Returns appropriate adaptor instance- Validates target and returns ClaudeAdaptor, GeminiAdaptor, OpenAIAdaptor, or MarkdownAdaptor
-
Platform-Specific Implementations:
- ClaudeAdaptor: ZIP + YAML frontmatter, Anthropic Skills API, Sonnet 4 enhancement
- GeminiAdaptor: tar.gz, Google Files API + Grounding, Gemini 2.0 Flash enhancement
- OpenAIAdaptor: ZIP + Assistant instructions, Assistants API + Vector Store, GPT-4o enhancement
- MarkdownAdaptor: ZIP with pure markdown, manual upload, no enhancement
Usage Pattern:
from skill_seekers.cli.adaptors import get_adaptor
# Get platform-specific adaptor
adaptor = get_adaptor('gemini') # or 'claude', 'openai', 'markdown'
# Package skill in platform format
adaptor.package(skill_dir='output/react/', output_path='output/')
# Upload to platform (if supported)
adaptor.upload(package_path='output/react-gemini.tar.gz', api_key=os.getenv('GOOGLE_API_KEY'))
# AI enhancement with platform-specific model
adaptor.enhance(skill_dir='output/react/', mode='api')
Benefits:
- ✅ Single codebase supports 4 platforms
- ✅ Platform-specific optimizations (format, APIs, models)
- ✅ Easy to add new platforms (implement BaseAdaptor)
- ✅ Clean separation of concerns
Data Flow
-
Scrape Phase (
scrape_all()in src/skill_seekers/cli/doc_scraper.py):- Input: Config JSON (name, base_url, selectors, url_patterns, categories)
- Process: BFS traversal from base_url, respecting include/exclude patterns
- Output:
output/{name}_data/pages/*.json+summary.json
-
Build Phase (
build_skill()in src/skill_seekers/cli/doc_scraper.py):- Input: Scraped JSON data from
output/{name}_data/ - Process: Load pages → Smart categorize → Extract patterns → Generate references
- Output:
output/{name}/SKILL.md+output/{name}/references/*.md
- Input: Scraped JSON data from
-
Enhancement Phase (optional, platform-aware via adaptors):
- Input: Built skill directory with references
- Process: Platform-specific LLM analyzes references and rewrites SKILL.md
- Output: Enhanced SKILL.md with real examples and guidance
- Models: Claude Sonnet 4, Gemini 2.0 Flash, or GPT-4o (depending on target)
-
Package Phase (platform-aware via adaptors):
- Input: Skill directory + target platform
- Process: Platform adaptor packages in appropriate format
- Output:
{name}.zip,{name}-gemini.tar.gz,{name}-openai.zip, or{name}-markdown.zip
-
Upload Phase (optional, platform-aware via adaptors):
- Input: Platform-specific package + API key
- Process: Upload via platform API (Anthropic Skills, Google Files, OpenAI Assistants)
- Output: Skill available in target LLM platform
Configuration File Structure
Config files (configs/*.json) define scraping behavior:
{
"name": "godot",
"description": "When to use this skill",
"base_url": "https://docs.godotengine.org/en/stable/",
"selectors": {
"main_content": "div[role='main']",
"title": "title",
"code_blocks": "pre"
},
"url_patterns": {
"include": [],
"exclude": ["/search.html", "/_static/"]
},
"categories": {
"getting_started": ["introduction", "getting_started"],
"scripting": ["scripting", "gdscript"],
"api": ["api", "reference", "class"]
},
"rate_limit": 0.5,
"max_pages": 500
}
Config Parameters:
name: Skill identifier (output directory name)description: When Claude should use this skillbase_url: Starting URL for scrapingselectors.main_content: CSS selector for main content (common:article,main,div[role="main"])selectors.title: CSS selector for page titleselectors.code_blocks: CSS selector for code samplesurl_patterns.include: Only scrape URLs containing these patternsurl_patterns.exclude: Skip URLs containing these patternscategories: Keyword mapping for categorizationrate_limit: Delay between requests (seconds)max_pages: Maximum pages to scrapeskip_llms_txt: Skip llms.txt detection, force HTML scraping (default: false)exclude_dirs_additional: Add custom directories to default exclusions (for local repo analysis)exclude_dirs: Replace default directory exclusions entirely (advanced, for local repo analysis)
Key Features & Implementation
Auto-Detect Existing Data
Tool checks for output/{name}_data/ and prompts to reuse, avoiding re-scraping (check_existing_data() in doc_scraper.py:653-660).
Configurable Directory Exclusions (Local Repository Analysis)
When using local_repo_path for unlimited local repository analysis, you can customize which directories to exclude from analysis.
Smart Defaults:
Automatically excludes common directories: venv, node_modules, __pycache__, .git, build, dist, .pytest_cache, htmlcov, .tox, .mypy_cache, etc.
Extend Mode (exclude_dirs_additional): Add custom exclusions to defaults
{
"sources": [{
"type": "github",
"local_repo_path": "/path/to/repo",
"exclude_dirs_additional": ["proprietary", "legacy", "third_party"]
}]
}
Replace Mode (exclude_dirs): Override defaults entirely (advanced)
{
"sources": [{
"type": "github",
"local_repo_path": "/path/to/repo",
"exclude_dirs": ["node_modules", ".git", "custom_vendor"]
}]
}
Use Cases:
- Monorepos with custom directory structures
- Enterprise projects with non-standard naming
- Including unusual directories (e.g., analyzing venv code)
- Minimal exclusions for small/simple projects
See: should_exclude_dir() in github_scraper.py:304-306
Language Detection
Detects code languages from:
- CSS class attributes (
language-*,lang-*) - Heuristics (keywords like
def,const,func, etc.)
See: detect_language() in doc_scraper.py:135-165
Pattern Extraction
Looks for "Example:", "Pattern:", "Usage:" markers in content and extracts following code blocks (up to 5 per page).
See: extract_patterns() in doc_scraper.py:167-183
Smart Categorization
- Scores pages against category keywords (3 points for URL match, 2 for title, 1 for content)
- Threshold of 2+ for categorization
- Auto-infers categories from URL segments if none provided
- Falls back to "other" category
See: smart_categorize() and infer_categories() in doc_scraper.py:282-351
Enhanced SKILL.md Generation
Generated with:
- Real code examples from documentation (language-annotated)
- Quick reference patterns extracted from docs
- Common pattern section
- Category file listings
See: create_enhanced_skill_md() in doc_scraper.py:426-542
Common Workflows
First Time (With Scraping + Enhancement)
# 1. Scrape + Build + AI Enhancement (LOCAL, no API key)
skill-seekers scrape --config configs/godot.json --enhance-local
# 2. Wait for enhancement terminal to close (~60 seconds)
# 3. Verify quality
cat output/godot/SKILL.md
# 4. Package
skill-seekers package output/godot/
# Result: godot.zip ready for Claude
# Time: 20-40 minutes (scraping) + 60 seconds (enhancement)
Using Cached Data (Fast Iteration)
# 1. Use existing data + Local Enhancement
skill-seekers scrape --config configs/godot.json --skip-scrape
skill-seekers enhance output/godot/
# 2. Package
skill-seekers package output/godot/
# Time: 1-3 minutes (build) + 60 seconds (enhancement)
Without Enhancement (Basic)
# 1. Scrape + Build (no enhancement)
skill-seekers scrape --config configs/godot.json
# 2. Package
skill-seekers package output/godot/
# Note: SKILL.md will be basic template - enhancement recommended
# Time: 20-40 minutes
Creating a New Framework Config
Option 1: Interactive
skill-seekers scrape --interactive
# Follow prompts, it creates the config for you
Option 2: Copy and Modify
# Copy a preset
cp configs/react.json configs/myframework.json
# Edit it
nano configs/myframework.json
# Test with limited pages first
# Set "max_pages": 20 in config
# Use it
skill-seekers scrape --config configs/myframework.json
Testing & Verification
Finding the Right CSS Selectors
Before creating a config, test selectors with BeautifulSoup:
from bs4 import BeautifulSoup
import requests
url = "https://docs.example.com/page"
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
# Try different selectors
print(soup.select_one('article'))
print(soup.select_one('main'))
print(soup.select_one('div[role="main"]'))
print(soup.select_one('div.content'))
# Test code block selector
print(soup.select('pre code'))
print(soup.select('pre'))
Verify Output Quality
After building, verify the skill quality:
# Check SKILL.md has real examples
cat output/godot/SKILL.md
# Check category structure
cat output/godot/references/index.md
# List all reference files
ls output/godot/references/
# Check specific category content
cat output/godot/references/getting_started.md
# Verify code samples have language detection
grep -A 3 "```" output/godot/references/*.md | head -20
Test with Limited Pages
For faster testing, edit config to limit pages:
{
"max_pages": 20 // Test with just 20 pages
}
Troubleshooting
No Content Extracted
Problem: Pages scraped but content is empty
Solution: Check main_content selector in config. Try:
articlemaindiv[role="main"]div.content
Use the BeautifulSoup testing approach above to find the right selector.
Poor Categorization
Problem: Pages not categorized well
Solution: Edit categories section in config with better keywords specific to the documentation structure. Check URL patterns in scraped data:
# See what URLs were scraped
cat output/godot_data/summary.json | grep url | head -20
Data Exists But Won't Use It
Problem: Tool won't reuse existing data
Solution: Force re-scrape:
rm -rf output/myframework_data/
skill-seekers scrape --config configs/myframework.json
Rate Limiting Issues
Problem: Getting rate limited or blocked by documentation server
Solution: Increase rate_limit value in config:
{
"rate_limit": 1.0 // Change from 0.5 to 1.0 seconds
}
Package Path Error
Problem: doc_scraper.py shows wrong cli/package_skill.py path
Expected output:
skill-seekers package output/godot/
Not:
python3 /mnt/skills/examples/skill-creator/scripts/cli/package_skill.py output/godot/
The correct command uses the local cli/package_skill.py in the repository root.
Key Code Locations (v2.0.0)
Documentation Scraper (src/skill_seekers/cli/doc_scraper.py):
- URL validation:
is_valid_url() - Content extraction:
extract_content() - Language detection:
detect_language() - Pattern extraction:
extract_patterns() - Smart categorization:
smart_categorize() - Category inference:
infer_categories() - Quick reference generation:
generate_quick_reference() - SKILL.md generation:
create_enhanced_skill_md() - Scraping loop:
scrape_all() - Main workflow:
main()
Other Key Files:
- GitHub scraper:
src/skill_seekers/cli/github_scraper.py - PDF scraper:
src/skill_seekers/cli/pdf_scraper.py - Unified scraper:
src/skill_seekers/cli/unified_scraper.py - Conflict detection:
src/skill_seekers/cli/conflict_detector.py - Source merger:
src/skill_seekers/cli/merge_sources.py - Package tool:
src/skill_seekers/cli/package_skill.py - Upload tool:
src/skill_seekers/cli/upload_skill.py - MCP server:
src/skill_seekers/mcp/server.py - Entry points:
pyproject.toml(project.scripts section)
Enhancement Details
LOCAL Enhancement (Recommended)
- Uses your Claude Code Max plan (no API costs)
- Opens new terminal with Claude Code
- Analyzes reference files automatically
- Takes 30-60 seconds
- Quality: 9/10 (comparable to API version)
- Backs up original SKILL.md to SKILL.md.backup
API Enhancement (Alternative)
- Uses Anthropic API (~$0.15-$0.30 per skill)
- Requires ANTHROPIC_API_KEY
- Same quality as LOCAL
- Faster (no terminal launch)
- Better for automation/CI
What Enhancement Does:
- Reads reference documentation files
- Analyzes content with Claude
- Extracts 5-10 best code examples
- Creates comprehensive quick reference
- Adds domain-specific key concepts
- Provides navigation guidance for different skill levels
- Transforms 75-line templates into 500+ line comprehensive guides
Performance
| Task | Time | Notes |
|---|---|---|
| Scraping | 15-45 min | First time only |
| Building | 1-3 min | Fast! |
| Re-building | <1 min | With --skip-scrape |
| Enhancement (LOCAL) | 30-60 sec | Uses Claude Code Max |
| Enhancement (API) | 20-40 sec | Requires API key |
| Packaging | 5-10 sec | Final zip |
Available Configs (24 Total)
Single-Source Documentation Configs (14 configs)
Web Frameworks:
- ✅
react.json- React (article selector, 7,102 chars) - ✅
vue.json- Vue.js (main selector, 1,029 chars) - ✅
astro.json- Astro (article selector, 145 chars) - ✅
django.json- Django (article selector, 6,468 chars) - ✅
laravel.json- Laravel 9.x (#main-content selector, 16,131 chars) - ✅
fastapi.json- FastAPI (article selector, 11,906 chars) - ✅
hono.json- Hono web framework NEW!
DevOps & Automation:
- ✅
ansible-core.json- Ansible Core 2.19 (div[role='main'] selector, ~32K chars) - ✅
kubernetes.json- Kubernetes (main selector, 2,100 chars)
Game Engines:
- ✅
godot.json- Godot (div[role='main'] selector, 1,688 chars) - ✅
godot-large-example.json- Godot large docs example
CSS & Utilities:
- ✅
tailwind.json- Tailwind CSS (div.prose selector, 195 chars)
Gaming:
- ✅
steam-economy-complete.json- Steam Economy (div.documentation_bbcode, 588 chars)
Development Tools:
- ✅
claude-code.json- Claude Code documentation NEW!
Unified Multi-Source Configs (5 configs - NEW v2.0!)
- ✅
react_unified.json- React (docs + GitHub + code analysis) - ✅
django_unified.json- Django (docs + GitHub + code analysis) - ✅
fastapi_unified.json- FastAPI (docs + GitHub + code analysis) - ✅
fastapi_unified_test.json- FastAPI test config - ✅
godot_unified.json- Godot (docs + GitHub + code analysis)
Test/Example Configs (5 configs)
- 📝
godot_github.json- GitHub-only scraping example - 📝
react_github.json- GitHub-only scraping example - 📝
python-tutorial-test.json- Python tutorial test - 📝
example_pdf.json- PDF extraction example - 📝
test-manual.json- Manual testing config
Note: All configs verified and working! Unified configs fully tested with 22 passing tests. Last verified: November 29, 2025 (Post-v2.1.0 bug fixes)
Additional Documentation
User Guides:
- README.md - Complete user documentation
- BULLETPROOF_QUICKSTART.md - Complete beginner guide
- QUICKSTART.md - Get started in 3 steps
- TROUBLESHOOTING.md - Comprehensive troubleshooting
Technical Documentation:
- docs/CLAUDE.md - Detailed technical architecture
- docs/ENHANCEMENT.md - AI enhancement guide
- docs/UPLOAD_GUIDE.md - How to upload skills to Claude
- docs/UNIFIED_SCRAPING.md - Multi-source scraping guide
- docs/MCP_SETUP.md - MCP server setup
Project Planning:
- CHANGELOG.md - Release history and v2.0.0 details UPDATED!
- FUTURE_RELEASES.md - Roadmap for v2.1.0+ NEW!
- FLEXIBLE_ROADMAP.md - Complete task catalog (134 tasks)
- NEXT_TASKS.md - What to work on next
- TODO.md - Current focus
- STRUCTURE.md - Repository structure
Notes for Claude Code
Project Status (v2.0.0):
- ✅ Published on PyPI: Install with
pip install skill-seekers - ✅ Modern Python Packaging: pyproject.toml, src/ layout, entry points
- ✅ Unified CLI: Single
skill-seekerscommand with Git-style subcommands - ✅ CI/CD Working: All 5 test matrix jobs passing (Ubuntu + macOS, Python 3.10-3.12)
- ✅ Test Coverage: 391 tests passing, 39% coverage
- ✅ Documentation: Complete user and technical documentation
Architecture:
- Python-based documentation scraper with multi-source support
- Main scraper:
src/skill_seekers/cli/doc_scraper.py(~790 lines) - Unified scraping: Combines docs + GitHub + PDF with conflict detection
- Modern packaging: PEP 621 compliant with proper dependency management
- MCP Integration: 9 tools for Claude Code Max integration
CLI Architecture (Git-style subcommands):
- Entry point:
src/skill_seekers/cli/main.py- Unified CLI dispatcher - Subcommands: scrape, github, pdf, unified, enhance, package, upload, estimate
- Design pattern: Main CLI routes to individual tool entry points (delegates to existing main() functions)
- Backward compatibility: Individual tools (
skill-seekers-scrape, etc.) still work directly - Key insight: The unified CLI modifies sys.argv and calls existing main() functions to maintain compatibility
Development Workflow:
-
Install:
pip install -e .(editable mode for development)# Install with all platform dependencies pip install -e ".[all-llms]" # Or install specific platforms pip install -e ".[gemini]" # Google Gemini support pip install -e ".[openai]" # OpenAI ChatGPT support -
Run tests:
# All tests pytest tests/ -v # Specific test file pytest tests/test_scraper_features.py -v # Multi-platform tests pytest tests/test_install_multiplatform.py -v # With coverage pytest tests/ --cov=src/skill_seekers --cov-report=term --cov-report=html # Single test pytest tests/test_scraper_features.py::test_detect_language -v # MCP server tests pytest tests/test_mcp_fastmcp.py -v -
Build package:
# Using uv (recommended) uv build # Or using build python -m build -
Publish:
# To PyPI uv publish # Or using twine python -m twine upload dist/* -
Test CLI commands:
# Test scraping (dry run) skill-seekers scrape --config configs/react.json --dry-run # Test multi-platform packaging skill-seekers package output/react/ --target gemini --dry-run # Test MCP server (stdio mode) python -m skill_seekers.mcp.server # Test MCP server (HTTP mode) python -m skill_seekers.mcp.server --transport http --port 8765
Test Architecture:
- Test files: 40+ test files covering all features (see
tests/directory) - CI Matrix: Tests run on Ubuntu + macOS with Python 3.10, 3.11, 3.12, 3.13
- Coverage: 700+ tests passing across all platforms
- Key test categories:
test_scraper_features.py- Core scraping functionalitytest_mcp_server.py- MCP integration (18 tools)test_mcp_fastmcp.py- FastMCP framework and HTTP transporttest_unified.py- Multi-source scrapingtest_github_scraper.py- GitHub repository analysistest_pdf_scraper.py- PDF extractiontest_install_multiplatform.py- NEW Multi-platform packaging and uploadtest_integration.py- End-to-end workflowstest_install_skill.py- One-command install workflowtest_install_agent.py- AI agent installation
- IMPORTANT: Must run
pip install -e .before tests (src/ layout requirement) - Platform Tests: Each platform adaptor has dedicated test coverage
Environment Variables & API Keys:
# Claude AI (default platform)
export ANTHROPIC_API_KEY=sk-ant-...
# Google Gemini (optional)
export GOOGLE_API_KEY=AIza...
# OpenAI ChatGPT (optional)
export OPENAI_API_KEY=sk-...
# GitHub (for higher rate limits in repo scraping)
export GITHUB_TOKEN=ghp_...
# Private config repositories (optional)
export GITLAB_TOKEN=glpat-...
export GITEA_TOKEN=...
export BITBUCKET_TOKEN=...
Key Points:
- Output is cached and reusable in
output/(git-ignored) - Enhancement is optional but highly recommended
- All 24 configs are working and tested
- CI workflow requires
pip install -e .to install package before running tests - Never skip tests - all tests must pass before commits (per user instructions)
- Platform-specific dependencies are optional: use
pip install skill-seekers[gemini]orpip install skill-seekers[openai]as needed