Files
skill-seekers-reference/CLAUDE.md
yusyus 5e166c40b9 chore: Bump version to v2.5.1 - Critical PyPI Bug Fix
Version Updates:
- pyproject.toml: 2.5.0 → 2.5.1
- src/skill_seekers/__init__.py: 2.0.0 → 2.5.1
- src/skill_seekers/cli/__init__.py: 2.0.0 → 2.5.1
- src/skill_seekers/cli/main.py: 2.4.0 → 2.5.1
- src/skill_seekers/mcp/__init__.py: 2.4.0 → 2.5.1
- src/skill_seekers/mcp/tools/__init__.py: 2.4.0 → 2.5.1

CHANGELOG:
- Added v2.5.1 release notes documenting PR #221 fix
- Critical: Fixed missing skill_seekers.cli.adaptors package
- Impact: Restores all multi-platform features for PyPI users

Documentation:
- Updated CLAUDE.md to v2.5.0 with multi-platform details
- Added platform adaptor architecture documentation
- Updated test architecture and environment variables

Related: PR #221 (merged), Issue #222 (py.typed follow-up)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30 23:22:30 +03:00

39 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

🎯 Current Status (December 30, 2025)

Version: v2.5.0 (Production Ready - Multi-Platform Feature Parity!) Active Development: Flexible, incremental task-based approach

Recent Updates (December 2025):

🎉 MAJOR RELEASE: Multi-Platform Feature Parity (v2.5.0)

  • 🌐 4 LLM Platforms: Claude AI, Google Gemini, OpenAI ChatGPT, Generic Markdown
  • Complete Feature Parity: All skill modes work with all platforms
  • 🔧 Platform Adaptors: Clean architecture with platform-specific implementations
  • 📦 Smart Enhancement: Platform-specific AI models (Sonnet 4, Gemini 2.0, GPT-4o)
  • 🧪 Test Coverage: 700+ tests passing across all platforms
  • 📚 Unified Workflow: Same scraping output works for all platforms

🚀 Unified Multi-Source Scraping (v2.0.0)

  • NEW: Combine documentation + GitHub + PDF in one skill
  • NEW: Automatic conflict detection between docs and code
  • NEW: Rule-based and AI-powered merging
  • NEW: 5 example unified configs (React, Django, FastAPI, Godot, FastAPI-test)
  • Status: All 22 unified tests passing (18 core + 4 MCP integration)

Community Response (H1 Group):

  • Issue #8 Fixed - Added BULLETPROOF_QUICKSTART.md and TROUBLESHOOTING.md for beginners
  • Issue #7 Fixed - Fixed all 11 configs (Django, Laravel, Astro, Tailwind) - 100% working
  • Issue #4 Linked - Connected to roadmap Tasks A2/A3 (knowledge sharing + website)
  • PR #5 Reviewed - Approved anchor stripping feature (security verified, 32/32 tests pass)
  • MCP Setup Fixed - Path expansion bug resolved in setup_mcp.sh

📦 Configs Status:

  • 24 total configs available (including unified configs)
  • 5 unified configs added (React, Django, FastAPI, Godot, FastAPI-test)
  • Core selectors tested and validated
  • 📝 Single-source configs: ansible-core, astro, claude-code, django, fastapi, godot, godot-large-example, hono, kubernetes, laravel, react, steam-economy-complete, tailwind, vue
  • 📝 Multi-source configs: django_unified, fastapi_unified, fastapi_unified_test, godot_unified, react_unified
  • 📝 Test/Example configs: godot_github, react_github, python-tutorial-test, example_pdf, test-manual

📋 Recent Completions (December 2025):

  • DONE: Multi-platform support (v2.5.0) - 4 LLM platforms
  • DONE: Platform adaptor architecture with clean separation
  • DONE: Enhanced MCP tools with platform support (18 tools)
  • DONE: Multi-platform CLI commands (package, upload, enhance)
  • DONE: Test suite expanded to 700+ tests
  • DONE: Complete feature parity across all platforms

📊 Roadmap Progress:


🔌 MCP Integration Available

This repository includes a fully tested MCP server with 18 tools supporting 4 LLM platforms:

Core Tools (9):

  • list_configs - List all available preset configurations
  • generate_config - Generate new config for any docs site
  • validate_config - Validate config file structure
  • estimate_pages - Estimate page count before scraping
  • scrape_docs - Scrape and build a skill
  • package_skill - Package skill (supports --target: claude, gemini, openai, markdown)
  • upload_skill - Upload to LLM platform (supports --target: claude, gemini, openai)
  • enhance_skill - NEW! AI enhancement with platform support
  • install_skill - Complete workflow (fetch → scrape → enhance → package → upload)

Extended Tools (9):

  • scrape_github - Scrape GitHub repositories
  • scrape_pdf - Extract from PDFs
  • unified_scrape - Multi-source scraping
  • merge_sources - Merge docs + code
  • detect_conflicts - Find discrepancies
  • split_config - Split large configs
  • generate_router - Generate router skills
  • add_config_source - Register git repos
  • fetch_config - Fetch from git

Setup: See docs/MCP_SETUP.md or run ./setup_mcp.sh

Status: Tested with 5 AI agents (Claude Code, Cursor, Windsurf, VS Code + Cline, IntelliJ IDEA)

Overview

Skill Seeker automatically converts any documentation website into a Claude AI skill. It scrapes documentation, organizes content, extracts code patterns, and packages everything into an uploadable .zip file for Claude.

Prerequisites

Python Version: Python 3.10 or higher (required for MCP integration)

Installation:

# Install globally or in virtual environment
pip install skill-seekers

# Use the unified CLI immediately
skill-seekers scrape --config configs/react.json
skill-seekers --help

Option 2: Install from Source (For Development)

# Clone the repository
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # macOS/Linux (Windows: venv\Scripts\activate)

# Install in editable mode
pip install -e .

# Or install dependencies manually
pip install -r requirements.txt

Why use a virtual environment?

  • Keeps dependencies isolated from system Python
  • Prevents package version conflicts
  • Standard Python development practice
  • Required for running tests with pytest

Optional (for API-based enhancement):

pip install anthropic
export ANTHROPIC_API_KEY=sk-ant-...

Core Commands

Multi-Platform Support (NEW in v2.5.0)

# Package for different LLM platforms
skill-seekers package output/react/ --target claude    # Default
skill-seekers package output/react/ --target gemini
skill-seekers package output/react/ --target openai
skill-seekers package output/react/ --target markdown

# Upload to platform
skill-seekers upload react-gemini.tar.gz --target gemini
skill-seekers upload react-openai.zip --target openai

# AI enhancement with platform-specific models
skill-seekers enhance output/react/ --target gemini --mode api
skill-seekers enhance output/react/ --target openai --mode api

Quick Start - Use a Preset

# Single-source scraping (documentation only)
skill-seekers scrape --config configs/godot.json
skill-seekers scrape --config configs/react.json
skill-seekers scrape --config configs/vue.json
skill-seekers scrape --config configs/django.json
skill-seekers scrape --config configs/laravel.json
skill-seekers scrape --config configs/fastapi.json

Unified Multi-Source Scraping (NEW - v2.0.0)

# Combine documentation + GitHub + PDF in one skill
skill-seekers unified --config configs/react_unified.json
skill-seekers unified --config configs/django_unified.json
skill-seekers unified --config configs/fastapi_unified.json
skill-seekers unified --config configs/godot_unified.json

# Override merge mode
skill-seekers unified --config configs/react_unified.json --merge-mode claude-enhanced

# Result: One comprehensive skill with conflict detection

What makes it special:

  • Detects discrepancies between documentation and code
  • Shows both versions side-by-side with ⚠️ warnings
  • Identifies outdated docs and undocumented features
  • Single source of truth showing intent (docs) AND reality (code)

See full guide: docs/UNIFIED_SCRAPING.md

# 1. Install from PyPI (one-time, easiest!)
pip install skill-seekers

# 2. Estimate page count BEFORE scraping (fast, no data download)
skill-seekers estimate configs/godot.json
# Time: ~1-2 minutes, shows estimated total pages and recommended max_pages

# 3. Scrape with local enhancement (uses Claude Code Max, no API key)
skill-seekers scrape --config configs/godot.json --enhance-local
# Time: 20-40 minutes scraping + 60 seconds enhancement

# 4. Package the skill
skill-seekers package output/godot/

# Result: godot.zip ready to upload to Claude

NEW! One-Command Install Workflow (v2.1.1)

The fastest way to install a skill - complete automation from config to uploaded skill:

# Install React skill from official configs (auto-uploads to Claude)
skill-seekers install --config react
# Time: 20-45 minutes total (scraping 20-40 min + enhancement 60 sec + upload 5 sec)

# Install from local config file
skill-seekers install --config configs/custom.json

# Install without uploading (package only)
skill-seekers install --config django --no-upload

# Unlimited scraping (no page limits - WARNING: can take hours)
skill-seekers install --config godot --unlimited

# Preview workflow without executing
skill-seekers install --config react --dry-run

# Custom output directory
skill-seekers install --config vue --destination /tmp/skills

What it does automatically:

  1. Fetches config from API (if config name provided)
  2. Scrapes documentation
  3. AI Enhancement (MANDATORY) - 30-60 sec, quality boost from 3/10 → 9/10
  4. Packages skill to .zip
  5. Uploads to Claude (if ANTHROPIC_API_KEY set)

Why use this:

  • Zero friction - One command instead of 5 separate steps
  • Quality guaranteed - Enhancement is mandatory, ensures professional output
  • Complete automation - From config name to uploaded skill
  • Time savings - Fully automated workflow

Phases executed:

📥 PHASE 1: Fetch Config (if config name provided)
📖 PHASE 2: Scrape Documentation
✨ PHASE 3: AI Enhancement (MANDATORY - no skip option)
📦 PHASE 4: Package Skill
☁️  PHASE 5: Upload to Claude (optional)

Interactive Mode

# Step-by-step configuration wizard
skill-seekers scrape --interactive

Quick Mode (Minimal Config)

# Create skill from any documentation URL
skill-seekers scrape --name react --url https://react.dev/ --description "React framework for UIs"

Skip Scraping (Use Cached Data)

# Fast rebuild using previously scraped data
skill-seekers scrape --config configs/godot.json --skip-scrape
# Time: 1-3 minutes (instant rebuild)

Async Mode (2-3x Faster Scraping)

# Enable async mode with 8 workers for best performance
skill-seekers scrape --config configs/react.json --async --workers 8

# Quick mode with async
skill-seekers scrape --name react --url https://react.dev/ --async --workers 8

# Dry run with async to test
skill-seekers scrape --config configs/godot.json --async --workers 4 --dry-run

Recommended Settings:

  • Small docs (~100-500 pages): --async --workers 4
  • Medium docs (~500-2000 pages): --async --workers 8
  • Large docs (2000+ pages): --async --workers 8 --no-rate-limit

Performance:

  • Sync: ~18 pages/sec, 120 MB memory
  • Async: ~55 pages/sec, 40 MB memory (3x faster!)

See full guide: ASYNC_SUPPORT.md

Enhancement Options

LOCAL Enhancement (Recommended - No API Key Required):

# During scraping
skill-seekers scrape --config configs/react.json --enhance-local

# Standalone after scraping
skill-seekers enhance output/react/

API Enhancement (Alternative - Requires API Key):

# During scraping
skill-seekers scrape --config configs/react.json --enhance

# Standalone after scraping
skill-seekers-enhance output/react/
skill-seekers-enhance output/react/ --api-key sk-ant-...

Package and Upload the Skill

# Package skill (opens folder, shows upload instructions)
skill-seekers package output/godot/
# Result: output/godot.zip

# Package and auto-upload (requires ANTHROPIC_API_KEY)
export ANTHROPIC_API_KEY=sk-ant-...
skill-seekers package output/godot/ --upload

# Upload existing .zip
skill-seekers upload output/godot.zip

# Package without opening folder
skill-seekers package output/godot/ --no-open

Install to AI Agents

# Single agent installation
skill-seekers install-agent output/godot/ --agent cursor

# Install to all agents
skill-seekers install-agent output/godot/ --agent all

# Force overwrite
skill-seekers install-agent output/godot/ --agent claude --force

# Dry run (preview only)
skill-seekers install-agent output/godot/ --agent cursor --dry-run

Supported agents: claude, cursor, vscode, copilot, amp, goose, opencode, letta, aide, windsurf, all

Installation paths:

  • Global agents (claude, amp, goose, etc.): Install to ~/.{agent}/skills/
  • Project agents (cursor, vscode): Install to .{agent}/skills/ in current directory

Force Re-scrape

# Delete cached data and re-scrape from scratch
rm -rf output/godot_data/
skill-seekers scrape --config configs/godot.json

Estimate Page Count (Before Scraping)

# Quick estimation - discover up to 100 pages
skill-seekers estimate configs/react.json --max-discovery 100
# Time: ~30-60 seconds

# Full estimation - discover up to 1000 pages (default)
skill-seekers estimate configs/godot.json
# Time: ~1-2 minutes

# Deep estimation - discover up to 2000 pages
skill-seekers estimate configs/vue.json --max-discovery 2000
# Time: ~3-5 minutes

# What it shows:
# - Estimated total pages
# - Recommended max_pages value
# - Estimated scraping time
# - Discovery rate (pages/sec)

Why use estimation:

  • Validates config URL patterns before full scrape
  • Helps set optimal max_pages value
  • Estimates total scraping time
  • Fast (only HEAD requests + minimal parsing)
  • No data downloaded or stored

Repository Architecture

File Structure (v2.5.0 - Multi-Platform Architecture)

Skill_Seekers/
├── pyproject.toml              # Modern Python package configuration (PEP 621)
├── src/                        # Source code (src/ layout best practice)
│   └── skill_seekers/
│       ├── __init__.py
│       ├── cli/                # CLI tools (entry points)
│       │   ├── main.py             # Unified CLI dispatcher (Git-style)
│       │   ├── doc_scraper.py      # Main scraper (~790 lines)
│       │   ├── estimate_pages.py   # Page count estimator
│       │   ├── enhance_skill_local.py  # AI enhancement (local)
│       │   ├── package_skill.py    # Skill packager
│       │   ├── upload_skill.py     # Upload to platforms
│       │   ├── install_skill.py    # Complete workflow automation
│       │   ├── install_agent.py    # Install to AI agent directories
│       │   ├── github_scraper.py   # GitHub scraper
│       │   ├── pdf_scraper.py      # PDF scraper
│       │   ├── unified_scraper.py  # Unified multi-source scraper
│       │   ├── merge_sources.py    # Source merger
│       │   ├── conflict_detector.py # Conflict detection
│       │   └── adaptors/           # Platform adaptor architecture
│       │       ├── __init__.py         # Factory: get_adaptor(target)
│       │       ├── base_adaptor.py     # Abstract base class
│       │       ├── claude_adaptor.py   # Claude AI implementation
│       │       ├── gemini_adaptor.py   # Google Gemini implementation
│       │       ├── openai_adaptor.py   # OpenAI ChatGPT implementation
│       │       └── markdown_adaptor.py # Generic Markdown export
│       └── mcp/                # MCP server integration
│           ├── server.py           # FastMCP-based server (stdio + HTTP)
│           └── tools/              # MCP tool implementations
├── tests/                      # Test suite (700+ tests passing)
│   ├── test_scraper_features.py
│   ├── test_config_validation.py
│   ├── test_integration.py
│   ├── test_mcp_server.py
│   ├── test_mcp_fastmcp.py     # FastMCP framework tests
│   ├── test_unified.py         # Unified scraping tests
│   ├── test_install_multiplatform.py  # Multi-platform tests
│   └── ... (40+ test files)
├── configs/                    # Preset configurations (24 configs)
│   ├── godot.json
│   ├── react.json
│   ├── django_unified.json     # Multi-source configs
│   └── ...
├── docs/                       # Documentation
│   ├── CLAUDE.md               # This file
│   ├── ENHANCEMENT.md          # Enhancement guide
│   ├── UPLOAD_GUIDE.md         # Upload instructions
│   └── UNIFIED_SCRAPING.md     # Unified scraping guide
├── README.md                   # User documentation
├── CHANGELOG.md                # Release history
├── FUTURE_RELEASES.md          # Roadmap
└── output/                     # Generated output (git-ignored)
    ├── {name}_data/            # Scraped raw data (cached)
    │   ├── pages/*.json        # Individual page data
    │   └── summary.json        # Scraping summary
    └── {name}/                 # Built skill directory
        ├── SKILL.md            # Main skill file
        ├── SKILL.md.backup     # Backup (if enhanced)
        ├── references/         # Categorized documentation
        │   ├── index.md
        │   ├── getting_started.md
        │   ├── api.md
        │   └── ...
        ├── scripts/            # Empty (user scripts)
        └── assets/             # Empty (user assets)

Key Changes in v2.5.0:

  • Platform Adaptor Architecture: Clean separation for Claude, Gemini, OpenAI, Markdown
  • Multi-platform CLI: --target flag on package/upload/enhance commands
  • 18 MCP Tools: Extended from 9 to 18 tools with platform support
  • src/ layout: Modern Python packaging structure
  • pyproject.toml: PEP 621 compliant with optional platform dependencies
  • Entry points: skill-seekers CLI with Git-style subcommands
  • Published to PyPI: pip install skill-seekers + platform extras

Platform Adaptor Architecture (NEW in v2.5.0)

Design Pattern: Strategy pattern with factory method for platform-specific implementations

Key Components:

  • BaseAdaptor (src/skill_seekers/cli/adaptors/base_adaptor.py): Abstract base class defining interface

    • package(skill_dir, output_path) - Package skill in platform-specific format
    • upload(package_path, api_key) - Upload to platform API
    • enhance(skill_dir, mode) - AI enhancement using platform-specific model
  • Factory Function (src/skill_seekers/cli/adaptors/__init__.py):

    • get_adaptor(target: str) -> BaseAdaptor - Returns appropriate adaptor instance
    • Validates target and returns ClaudeAdaptor, GeminiAdaptor, OpenAIAdaptor, or MarkdownAdaptor
  • Platform-Specific Implementations:

    • ClaudeAdaptor: ZIP + YAML frontmatter, Anthropic Skills API, Sonnet 4 enhancement
    • GeminiAdaptor: tar.gz, Google Files API + Grounding, Gemini 2.0 Flash enhancement
    • OpenAIAdaptor: ZIP + Assistant instructions, Assistants API + Vector Store, GPT-4o enhancement
    • MarkdownAdaptor: ZIP with pure markdown, manual upload, no enhancement

Usage Pattern:

from skill_seekers.cli.adaptors import get_adaptor

# Get platform-specific adaptor
adaptor = get_adaptor('gemini')  # or 'claude', 'openai', 'markdown'

# Package skill in platform format
adaptor.package(skill_dir='output/react/', output_path='output/')

# Upload to platform (if supported)
adaptor.upload(package_path='output/react-gemini.tar.gz', api_key=os.getenv('GOOGLE_API_KEY'))

# AI enhancement with platform-specific model
adaptor.enhance(skill_dir='output/react/', mode='api')

Benefits:

  • Single codebase supports 4 platforms
  • Platform-specific optimizations (format, APIs, models)
  • Easy to add new platforms (implement BaseAdaptor)
  • Clean separation of concerns

Data Flow

  1. Scrape Phase (scrape_all() in src/skill_seekers/cli/doc_scraper.py):

    • Input: Config JSON (name, base_url, selectors, url_patterns, categories)
    • Process: BFS traversal from base_url, respecting include/exclude patterns
    • Output: output/{name}_data/pages/*.json + summary.json
  2. Build Phase (build_skill() in src/skill_seekers/cli/doc_scraper.py):

    • Input: Scraped JSON data from output/{name}_data/
    • Process: Load pages → Smart categorize → Extract patterns → Generate references
    • Output: output/{name}/SKILL.md + output/{name}/references/*.md
  3. Enhancement Phase (optional, platform-aware via adaptors):

    • Input: Built skill directory with references
    • Process: Platform-specific LLM analyzes references and rewrites SKILL.md
    • Output: Enhanced SKILL.md with real examples and guidance
    • Models: Claude Sonnet 4, Gemini 2.0 Flash, or GPT-4o (depending on target)
  4. Package Phase (platform-aware via adaptors):

    • Input: Skill directory + target platform
    • Process: Platform adaptor packages in appropriate format
    • Output: {name}.zip, {name}-gemini.tar.gz, {name}-openai.zip, or {name}-markdown.zip
  5. Upload Phase (optional, platform-aware via adaptors):

    • Input: Platform-specific package + API key
    • Process: Upload via platform API (Anthropic Skills, Google Files, OpenAI Assistants)
    • Output: Skill available in target LLM platform

Configuration File Structure

Config files (configs/*.json) define scraping behavior:

{
  "name": "godot",
  "description": "When to use this skill",
  "base_url": "https://docs.godotengine.org/en/stable/",
  "selectors": {
    "main_content": "div[role='main']",
    "title": "title",
    "code_blocks": "pre"
  },
  "url_patterns": {
    "include": [],
    "exclude": ["/search.html", "/_static/"]
  },
  "categories": {
    "getting_started": ["introduction", "getting_started"],
    "scripting": ["scripting", "gdscript"],
    "api": ["api", "reference", "class"]
  },
  "rate_limit": 0.5,
  "max_pages": 500
}

Config Parameters:

  • name: Skill identifier (output directory name)
  • description: When Claude should use this skill
  • base_url: Starting URL for scraping
  • selectors.main_content: CSS selector for main content (common: article, main, div[role="main"])
  • selectors.title: CSS selector for page title
  • selectors.code_blocks: CSS selector for code samples
  • url_patterns.include: Only scrape URLs containing these patterns
  • url_patterns.exclude: Skip URLs containing these patterns
  • categories: Keyword mapping for categorization
  • rate_limit: Delay between requests (seconds)
  • max_pages: Maximum pages to scrape
  • skip_llms_txt: Skip llms.txt detection, force HTML scraping (default: false)
  • exclude_dirs_additional: Add custom directories to default exclusions (for local repo analysis)
  • exclude_dirs: Replace default directory exclusions entirely (advanced, for local repo analysis)

Key Features & Implementation

Auto-Detect Existing Data

Tool checks for output/{name}_data/ and prompts to reuse, avoiding re-scraping (check_existing_data() in doc_scraper.py:653-660).

Configurable Directory Exclusions (Local Repository Analysis)

When using local_repo_path for unlimited local repository analysis, you can customize which directories to exclude from analysis.

Smart Defaults: Automatically excludes common directories: venv, node_modules, __pycache__, .git, build, dist, .pytest_cache, htmlcov, .tox, .mypy_cache, etc.

Extend Mode (exclude_dirs_additional): Add custom exclusions to defaults

{
  "sources": [{
    "type": "github",
    "local_repo_path": "/path/to/repo",
    "exclude_dirs_additional": ["proprietary", "legacy", "third_party"]
  }]
}

Replace Mode (exclude_dirs): Override defaults entirely (advanced)

{
  "sources": [{
    "type": "github",
    "local_repo_path": "/path/to/repo",
    "exclude_dirs": ["node_modules", ".git", "custom_vendor"]
  }]
}

Use Cases:

  • Monorepos with custom directory structures
  • Enterprise projects with non-standard naming
  • Including unusual directories (e.g., analyzing venv code)
  • Minimal exclusions for small/simple projects

See: should_exclude_dir() in github_scraper.py:304-306

Language Detection

Detects code languages from:

  1. CSS class attributes (language-*, lang-*)
  2. Heuristics (keywords like def, const, func, etc.)

See: detect_language() in doc_scraper.py:135-165

Pattern Extraction

Looks for "Example:", "Pattern:", "Usage:" markers in content and extracts following code blocks (up to 5 per page).

See: extract_patterns() in doc_scraper.py:167-183

Smart Categorization

  • Scores pages against category keywords (3 points for URL match, 2 for title, 1 for content)
  • Threshold of 2+ for categorization
  • Auto-infers categories from URL segments if none provided
  • Falls back to "other" category

See: smart_categorize() and infer_categories() in doc_scraper.py:282-351

Enhanced SKILL.md Generation

Generated with:

  • Real code examples from documentation (language-annotated)
  • Quick reference patterns extracted from docs
  • Common pattern section
  • Category file listings

See: create_enhanced_skill_md() in doc_scraper.py:426-542

Common Workflows

First Time (With Scraping + Enhancement)

# 1. Scrape + Build + AI Enhancement (LOCAL, no API key)
skill-seekers scrape --config configs/godot.json --enhance-local

# 2. Wait for enhancement terminal to close (~60 seconds)

# 3. Verify quality
cat output/godot/SKILL.md

# 4. Package
skill-seekers package output/godot/

# Result: godot.zip ready for Claude
# Time: 20-40 minutes (scraping) + 60 seconds (enhancement)

Using Cached Data (Fast Iteration)

# 1. Use existing data + Local Enhancement
skill-seekers scrape --config configs/godot.json --skip-scrape
skill-seekers enhance output/godot/

# 2. Package
skill-seekers package output/godot/

# Time: 1-3 minutes (build) + 60 seconds (enhancement)

Without Enhancement (Basic)

# 1. Scrape + Build (no enhancement)
skill-seekers scrape --config configs/godot.json

# 2. Package
skill-seekers package output/godot/

# Note: SKILL.md will be basic template - enhancement recommended
# Time: 20-40 minutes

Creating a New Framework Config

Option 1: Interactive

skill-seekers scrape --interactive
# Follow prompts, it creates the config for you

Option 2: Copy and Modify

# Copy a preset
cp configs/react.json configs/myframework.json

# Edit it
nano configs/myframework.json

# Test with limited pages first
# Set "max_pages": 20 in config

# Use it
skill-seekers scrape --config configs/myframework.json

Testing & Verification

Finding the Right CSS Selectors

Before creating a config, test selectors with BeautifulSoup:

from bs4 import BeautifulSoup
import requests

url = "https://docs.example.com/page"
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

# Try different selectors
print(soup.select_one('article'))
print(soup.select_one('main'))
print(soup.select_one('div[role="main"]'))
print(soup.select_one('div.content'))

# Test code block selector
print(soup.select('pre code'))
print(soup.select('pre'))

Verify Output Quality

After building, verify the skill quality:

# Check SKILL.md has real examples
cat output/godot/SKILL.md

# Check category structure
cat output/godot/references/index.md

# List all reference files
ls output/godot/references/

# Check specific category content
cat output/godot/references/getting_started.md

# Verify code samples have language detection
grep -A 3 "```" output/godot/references/*.md | head -20

Test with Limited Pages

For faster testing, edit config to limit pages:

{
  "max_pages": 20  // Test with just 20 pages
}

Troubleshooting

No Content Extracted

Problem: Pages scraped but content is empty

Solution: Check main_content selector in config. Try:

  • article
  • main
  • div[role="main"]
  • div.content

Use the BeautifulSoup testing approach above to find the right selector.

Poor Categorization

Problem: Pages not categorized well

Solution: Edit categories section in config with better keywords specific to the documentation structure. Check URL patterns in scraped data:

# See what URLs were scraped
cat output/godot_data/summary.json | grep url | head -20

Data Exists But Won't Use It

Problem: Tool won't reuse existing data

Solution: Force re-scrape:

rm -rf output/myframework_data/
skill-seekers scrape --config configs/myframework.json

Rate Limiting Issues

Problem: Getting rate limited or blocked by documentation server

Solution: Increase rate_limit value in config:

{
  "rate_limit": 1.0  // Change from 0.5 to 1.0 seconds
}

Package Path Error

Problem: doc_scraper.py shows wrong cli/package_skill.py path

Expected output:

skill-seekers package output/godot/

Not:

python3 /mnt/skills/examples/skill-creator/scripts/cli/package_skill.py output/godot/

The correct command uses the local cli/package_skill.py in the repository root.

Key Code Locations (v2.0.0)

Documentation Scraper (src/skill_seekers/cli/doc_scraper.py):

  • URL validation: is_valid_url()
  • Content extraction: extract_content()
  • Language detection: detect_language()
  • Pattern extraction: extract_patterns()
  • Smart categorization: smart_categorize()
  • Category inference: infer_categories()
  • Quick reference generation: generate_quick_reference()
  • SKILL.md generation: create_enhanced_skill_md()
  • Scraping loop: scrape_all()
  • Main workflow: main()

Other Key Files:

  • GitHub scraper: src/skill_seekers/cli/github_scraper.py
  • PDF scraper: src/skill_seekers/cli/pdf_scraper.py
  • Unified scraper: src/skill_seekers/cli/unified_scraper.py
  • Conflict detection: src/skill_seekers/cli/conflict_detector.py
  • Source merger: src/skill_seekers/cli/merge_sources.py
  • Package tool: src/skill_seekers/cli/package_skill.py
  • Upload tool: src/skill_seekers/cli/upload_skill.py
  • MCP server: src/skill_seekers/mcp/server.py
  • Entry points: pyproject.toml (project.scripts section)

Enhancement Details

  • Uses your Claude Code Max plan (no API costs)
  • Opens new terminal with Claude Code
  • Analyzes reference files automatically
  • Takes 30-60 seconds
  • Quality: 9/10 (comparable to API version)
  • Backs up original SKILL.md to SKILL.md.backup

API Enhancement (Alternative)

  • Uses Anthropic API (~$0.15-$0.30 per skill)
  • Requires ANTHROPIC_API_KEY
  • Same quality as LOCAL
  • Faster (no terminal launch)
  • Better for automation/CI

What Enhancement Does:

  1. Reads reference documentation files
  2. Analyzes content with Claude
  3. Extracts 5-10 best code examples
  4. Creates comprehensive quick reference
  5. Adds domain-specific key concepts
  6. Provides navigation guidance for different skill levels
  7. Transforms 75-line templates into 500+ line comprehensive guides

Performance

Task Time Notes
Scraping 15-45 min First time only
Building 1-3 min Fast!
Re-building <1 min With --skip-scrape
Enhancement (LOCAL) 30-60 sec Uses Claude Code Max
Enhancement (API) 20-40 sec Requires API key
Packaging 5-10 sec Final zip

Available Configs (24 Total)

Single-Source Documentation Configs (14 configs)

Web Frameworks:

  • react.json - React (article selector, 7,102 chars)
  • vue.json - Vue.js (main selector, 1,029 chars)
  • astro.json - Astro (article selector, 145 chars)
  • django.json - Django (article selector, 6,468 chars)
  • laravel.json - Laravel 9.x (#main-content selector, 16,131 chars)
  • fastapi.json - FastAPI (article selector, 11,906 chars)
  • hono.json - Hono web framework NEW!

DevOps & Automation:

  • ansible-core.json - Ansible Core 2.19 (div[role='main'] selector, ~32K chars)
  • kubernetes.json - Kubernetes (main selector, 2,100 chars)

Game Engines:

  • godot.json - Godot (div[role='main'] selector, 1,688 chars)
  • godot-large-example.json - Godot large docs example

CSS & Utilities:

  • tailwind.json - Tailwind CSS (div.prose selector, 195 chars)

Gaming:

  • steam-economy-complete.json - Steam Economy (div.documentation_bbcode, 588 chars)

Development Tools:

  • claude-code.json - Claude Code documentation NEW!

Unified Multi-Source Configs (5 configs - NEW v2.0!)

  • react_unified.json - React (docs + GitHub + code analysis)
  • django_unified.json - Django (docs + GitHub + code analysis)
  • fastapi_unified.json - FastAPI (docs + GitHub + code analysis)
  • fastapi_unified_test.json - FastAPI test config
  • godot_unified.json - Godot (docs + GitHub + code analysis)

Test/Example Configs (5 configs)

  • 📝 godot_github.json - GitHub-only scraping example
  • 📝 react_github.json - GitHub-only scraping example
  • 📝 python-tutorial-test.json - Python tutorial test
  • 📝 example_pdf.json - PDF extraction example
  • 📝 test-manual.json - Manual testing config

Note: All configs verified and working! Unified configs fully tested with 22 passing tests. Last verified: November 29, 2025 (Post-v2.1.0 bug fixes)

Additional Documentation

User Guides:

Technical Documentation:

Project Planning:

Notes for Claude Code

Project Status (v2.0.0):

  • Published on PyPI: Install with pip install skill-seekers
  • Modern Python Packaging: pyproject.toml, src/ layout, entry points
  • Unified CLI: Single skill-seekers command with Git-style subcommands
  • CI/CD Working: All 5 test matrix jobs passing (Ubuntu + macOS, Python 3.10-3.12)
  • Test Coverage: 391 tests passing, 39% coverage
  • Documentation: Complete user and technical documentation

Architecture:

  • Python-based documentation scraper with multi-source support
  • Main scraper: src/skill_seekers/cli/doc_scraper.py (~790 lines)
  • Unified scraping: Combines docs + GitHub + PDF with conflict detection
  • Modern packaging: PEP 621 compliant with proper dependency management
  • MCP Integration: 9 tools for Claude Code Max integration

CLI Architecture (Git-style subcommands):

  • Entry point: src/skill_seekers/cli/main.py - Unified CLI dispatcher
  • Subcommands: scrape, github, pdf, unified, enhance, package, upload, estimate
  • Design pattern: Main CLI routes to individual tool entry points (delegates to existing main() functions)
  • Backward compatibility: Individual tools (skill-seekers-scrape, etc.) still work directly
  • Key insight: The unified CLI modifies sys.argv and calls existing main() functions to maintain compatibility

Development Workflow:

  1. Install: pip install -e . (editable mode for development)

    # Install with all platform dependencies
    pip install -e ".[all-llms]"
    
    # Or install specific platforms
    pip install -e ".[gemini]"  # Google Gemini support
    pip install -e ".[openai]"  # OpenAI ChatGPT support
    
  2. Run tests:

    # All tests
    pytest tests/ -v
    
    # Specific test file
    pytest tests/test_scraper_features.py -v
    
    # Multi-platform tests
    pytest tests/test_install_multiplatform.py -v
    
    # With coverage
    pytest tests/ --cov=src/skill_seekers --cov-report=term --cov-report=html
    
    # Single test
    pytest tests/test_scraper_features.py::test_detect_language -v
    
    # MCP server tests
    pytest tests/test_mcp_fastmcp.py -v
    
  3. Build package:

    # Using uv (recommended)
    uv build
    
    # Or using build
    python -m build
    
  4. Publish:

    # To PyPI
    uv publish
    
    # Or using twine
    python -m twine upload dist/*
    
  5. Test CLI commands:

    # Test scraping (dry run)
    skill-seekers scrape --config configs/react.json --dry-run
    
    # Test multi-platform packaging
    skill-seekers package output/react/ --target gemini --dry-run
    
    # Test MCP server (stdio mode)
    python -m skill_seekers.mcp.server
    
    # Test MCP server (HTTP mode)
    python -m skill_seekers.mcp.server --transport http --port 8765
    

Test Architecture:

  • Test files: 40+ test files covering all features (see tests/ directory)
  • CI Matrix: Tests run on Ubuntu + macOS with Python 3.10, 3.11, 3.12, 3.13
  • Coverage: 700+ tests passing across all platforms
  • Key test categories:
    • test_scraper_features.py - Core scraping functionality
    • test_mcp_server.py - MCP integration (18 tools)
    • test_mcp_fastmcp.py - FastMCP framework and HTTP transport
    • test_unified.py - Multi-source scraping
    • test_github_scraper.py - GitHub repository analysis
    • test_pdf_scraper.py - PDF extraction
    • test_install_multiplatform.py - NEW Multi-platform packaging and upload
    • test_integration.py - End-to-end workflows
    • test_install_skill.py - One-command install workflow
    • test_install_agent.py - AI agent installation
  • IMPORTANT: Must run pip install -e . before tests (src/ layout requirement)
  • Platform Tests: Each platform adaptor has dedicated test coverage

Environment Variables & API Keys:

# Claude AI (default platform)
export ANTHROPIC_API_KEY=sk-ant-...

# Google Gemini (optional)
export GOOGLE_API_KEY=AIza...

# OpenAI ChatGPT (optional)
export OPENAI_API_KEY=sk-...

# GitHub (for higher rate limits in repo scraping)
export GITHUB_TOKEN=ghp_...

# Private config repositories (optional)
export GITLAB_TOKEN=glpat-...
export GITEA_TOKEN=...
export BITBUCKET_TOKEN=...

Key Points:

  • Output is cached and reusable in output/ (git-ignored)
  • Enhancement is optional but highly recommended
  • All 24 configs are working and tested
  • CI workflow requires pip install -e . to install package before running tests
  • Never skip tests - all tests must pass before commits (per user instructions)
  • Platform-specific dependencies are optional: use pip install skill-seekers[gemini] or pip install skill-seekers[openai] as needed