firefrost-gaming/skill-seekers-reference

Files

yusyus bddb57f5ef Add large documentation handling (40K+ pages support)

Implement comprehensive system for handling very large documentation sites
with intelligent splitting strategies and router/hub architecture.

**New CLI Tools:**
- cli/split_config.py: Split large configs into focused sub-skills
  * Strategies: auto, category, router, size
  * Configurable target pages per skill (default: 5000)
  * Dry-run mode for preview

- cli/generate_router.py: Create intelligent router/hub skills
  * Auto-generates routing logic based on keywords
  * Creates SKILL.md with topic-to-skill mapping
  * Infers router name from sub-skills

- cli/package_multi.py: Batch package multiple skills
  * Package router + all sub-skills in one command
  * Progress tracking for each skill

**MCP Integration:**
- Added split_config tool (8 total MCP tools now)
- Added generate_router tool
- Supports 40K+ page documentation via MCP

**Configuration:**
- New split_strategy parameter in configs
- split_config section for fine-tuned control
- checkpoint section for resume capability (ready for Phase 4)
- Example: configs/godot-large-example.json

**Documentation:**
- docs/LARGE_DOCUMENTATION.md (500+ lines)
  * Complete guide for 10K+ page documentation
  * All splitting strategies explained
  * Detailed workflows with examples
  * Best practices and troubleshooting
  * Real-world examples (AWS, Microsoft, Godot)

**Features:**
✅ Handle 40K+ page documentation efficiently
✅ Parallel scraping support (5x-10x faster)
✅ Router + sub-skills architecture
✅ Intelligent keyword-based routing
✅ Multiple splitting strategies
✅ Full MCP integration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-10-19 20:48:03 +03:00

README.md

Add MCP server implementation with 6 tools

2025-10-19 19:43:25 +03:00

requirements.txt

Refactor: Convert to monorepo with CLI and MCP server

2025-10-19 15:19:53 +03:00

server.py

Add large documentation handling (40K+ pages support)

2025-10-19 20:48:03 +03:00

README.md

Skill Seeker MCP Server

Model Context Protocol (MCP) server for Skill Seeker - enables Claude Code to generate documentation skills directly.

What is This?

This MCP server allows Claude Code to use Skill Seeker's tools directly through natural language commands. Instead of running CLI commands manually, you can ask Claude Code to:

Generate config files for any documentation site
Estimate page counts before scraping
Scrape documentation and build skills
Package skills into .zip files
List and validate configurations

Quick Start

1. Install Dependencies

# From repository root
pip3 install -r mcp/requirements.txt
pip3 install requests beautifulsoup4

2. Quick Setup (Automated)

# Run the setup script
./setup_mcp.sh

# Follow the prompts - it will:
# - Install dependencies
# - Test the server
# - Generate configuration
# - Guide you through Claude Code setup

3. Manual Setup

Add to ~/.config/claude-code/mcp.json:

{
  "mcpServers": {
    "skill-seeker": {
      "command": "python3",
      "args": [
        "/path/to/Skill_Seekers/mcp/server.py"
      ],
      "cwd": "/path/to/Skill_Seekers"
    }
  }
}

Replace /path/to/Skill_Seekers with your actual repository path!

4. Restart Claude Code

Quit and reopen Claude Code (don't just close the window).

5. Test

In Claude Code, type:

List all available configs

You should see a list of preset configurations (Godot, React, Vue, etc.).

Available Tools

The MCP server exposes 6 tools:

1. `generate_config`

Create a new configuration file for any documentation website.

Parameters:

name (required): Skill name (e.g., "tailwind")
url (required): Documentation URL (e.g., "https://tailwindcss.com/docs")
description (required): When to use this skill
max_pages (optional): Maximum pages to scrape (default: 100)
rate_limit (optional): Delay between requests in seconds (default: 0.5)

Example:

Generate config for Tailwind CSS at https://tailwindcss.com/docs

2. `estimate_pages`

Estimate how many pages will be scraped from a config (fast, no data downloaded).

Parameters:

config_path (required): Path to config file (e.g., "configs/react.json")
max_discovery (optional): Maximum pages to discover (default: 1000)

Example:

Estimate pages for configs/react.json

3. `scrape_docs`

Scrape documentation and build Claude skill.

Parameters:

config_path (required): Path to config file
enhance_local (optional): Open terminal for local enhancement (default: false)
skip_scrape (optional): Use cached data (default: false)
dry_run (optional): Preview without saving (default: false)

Example:

Scrape docs using configs/react.json

4. `package_skill`

Package a skill directory into a .zip file ready for Claude upload.

Parameters:

skill_dir (required): Path to skill directory (e.g., "output/react/")

Example:

Package skill at output/react/

5. `list_configs`

List all available preset configurations.

Parameters: None

Example:

List all available configs

6. `validate_config`

Validate a config file for errors.

Parameters:

config_path (required): Path to config file

Example:

Validate configs/godot.json

Example Workflows

Generate a New Skill from Scratch

User: Generate config for Svelte at https://svelte.dev/docs

Claude: ✅ Config created: configs/svelte.json

User: Estimate pages for configs/svelte.json

Claude: 📊 Estimated pages: 150

User: Scrape docs using configs/svelte.json

Claude: ✅ Skill created at output/svelte/

User: Package skill at output/svelte/

Claude: ✅ Created: output/svelte.zip
      Ready to upload to Claude!

Use Existing Preset

User: List all available configs

Claude: [Shows all configs: godot, react, vue, django, fastapi, etc.]

User: Scrape docs using configs/react.json

Claude: ✅ Skill created at output/react/

User: Package skill at output/react/

Claude: ✅ Created: output/react.zip

Validate Before Scraping

User: Validate configs/godot.json

Claude: ✅ Config is valid!
        Name: godot
        Base URL: https://docs.godotengine.org/en/stable/
        Max pages: 500
        Rate limit: 0.5s

User: Scrape docs using configs/godot.json

Claude: [Starts scraping...]

Architecture

Server Structure

mcp/
├── server.py           # Main MCP server
├── requirements.txt    # MCP dependencies
└── README.md          # This file

How It Works

Claude Code sends MCP requests to the server
Server routes requests to appropriate tool functions
Tools call CLI scripts (doc_scraper.py, estimate_pages.py, etc.)
CLI scripts perform actual work (scraping, packaging, etc.)
Results returned to Claude Code via MCP protocol

Tool Implementation

Each tool is implemented as an async function:

async def generate_config_tool(args: dict) -> list[TextContent]:
    """Generate a config file"""
    # Create config JSON
    # Save to configs/
    # Return success message

Tools use subprocess.run() to call CLI scripts:

result = subprocess.run([
    sys.executable,
    str(CLI_DIR / "doc_scraper.py"),
    "--config", config_path
], capture_output=True, text=True)

Testing

The MCP server has comprehensive test coverage:

# Run MCP server tests (25 tests)
python3 -m pytest tests/test_mcp_server.py -v

# Expected output: 25 passed in ~0.3s

Test Coverage

Server initialization (2 tests)
Tool listing (2 tests)
generate_config (3 tests)
estimate_pages (3 tests)
scrape_docs (4 tests)
package_skill (2 tests)
list_configs (3 tests)
validate_config (3 tests)
Tool routing (2 tests)
Integration (1 test)

Total: 25 tests | Pass rate: 100%

Troubleshooting

MCP Server Not Loading

Symptoms:

Tools don't appear in Claude Code
No response to skill-seeker commands

Solutions:

Check configuration:
```
cat ~/.config/claude-code/mcp.json
```

Verify server can start:

python3 mcp/server.py
# Should start without errors (Ctrl+C to exit)

Check dependencies:
```
pip3 install -r mcp/requirements.txt
```
Completely restart Claude Code (quit and reopen)
Check Claude Code logs:
- macOS: ~/Library/Logs/Claude Code/
- Linux: ~/.config/claude-code/logs/

"ModuleNotFoundError: No module named 'mcp'"

pip3 install -r mcp/requirements.txt

Tools Appear But Don't Work

Solutions:

Verify cwd in config points to repository root

Check CLI tools exist:

ls cli/doc_scraper.py
ls cli/estimate_pages.py
ls cli/package_skill.py

Test CLI tools directly:
```
python3 cli/doc_scraper.py --help
```

Slow Operations

Check rate limit in configs (increase if needed)
Use smaller max_pages for testing
Use skip_scrape to avoid re-downloading data

Advanced Configuration

Using Virtual Environment

# Create venv
python3 -m venv venv
source venv/bin/activate
pip install -r mcp/requirements.txt
pip install requests beautifulsoup4
which python3  # Copy this path

Configure Claude Code to use venv Python:

{
  "mcpServers": {
    "skill-seeker": {
      "command": "/path/to/Skill_Seekers/venv/bin/python3",
      "args": ["/path/to/Skill_Seekers/mcp/server.py"],
      "cwd": "/path/to/Skill_Seekers"
    }
  }
}

Debug Mode

Enable verbose logging:

{
  "mcpServers": {
    "skill-seeker": {
      "command": "python3",
      "args": ["-u", "/path/to/Skill_Seekers/mcp/server.py"],
      "cwd": "/path/to/Skill_Seekers",
      "env": {
        "DEBUG": "1"
      }
    }
  }
}

With API Enhancement

For API-based enhancement (requires Anthropic API key):

{
  "mcpServers": {
    "skill-seeker": {
      "command": "python3",
      "args": ["/path/to/Skill_Seekers/mcp/server.py"],
      "cwd": "/path/to/Skill_Seekers",
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-your-key-here"
      }
    }
  }
}

Performance

Operation	Time	Notes
List configs	<1s	Instant
Generate config	<1s	Creates JSON file
Validate config	<1s	Quick validation
Estimate pages	1-2min	Fast, no data download
Scrape docs	15-45min	First time only
Scrape (cached)	<1min	With `skip_scrape`
Package skill	5-10s	Creates .zip

Documentation

Full Setup Guide: docs/MCP_SETUP.md
Main README: README.md
Usage Guide: docs/USAGE.md
Testing Guide: docs/TESTING.md

Support

Issues: GitHub Issues
Discussions: GitHub Discussions

License

MIT License - See LICENSE for details

README.md

Skill Seeker MCP Server

What is This?

Quick Start

1. Install Dependencies

2. Quick Setup (Automated)

3. Manual Setup

4. Restart Claude Code

5. Test

Available Tools

1. generate_config

2. estimate_pages

3. scrape_docs

4. package_skill

5. list_configs

6. validate_config

Example Workflows

Generate a New Skill from Scratch

Use Existing Preset

Validate Before Scraping

Architecture

Server Structure

How It Works

Tool Implementation

Testing

Test Coverage

Troubleshooting

MCP Server Not Loading

"ModuleNotFoundError: No module named 'mcp'"

Tools Appear But Don't Work

Slow Operations

Advanced Configuration

Using Virtual Environment

Debug Mode

With API Enhancement

Performance

Documentation

Support

License

1. `generate_config`

2. `estimate_pages`

3. `scrape_docs`

4. `package_skill`

5. `list_configs`

6. `validate_config`