Refactor: Convert to monorepo with CLI and MCP server

Major restructure to support both CLI usage and MCP integration: **Repository Structure:** - cli/ - All CLI tools (doc_scraper, estimate_pages, enhance_skill, etc.) - mcp/ - New MCP server for Claude Code integration - configs/ - Shared configuration files - tests/ - Updated to import from cli/ - docs/ - Shared documentation **MCP Server (NEW):** - mcp/server.py - Full MCP server implementation - 6 tools available: * generate_config - Create config from URL * estimate_pages - Fast page count estimation * scrape_docs - Full documentation scraping * package_skill - Package to .zip * list_configs - Show available presets * validate_config - Validate config files - mcp/README.md - Complete MCP documentation - mcp/requirements.txt - MCP dependencies **CLI Tools (Moved to cli/):** - All existing functionality preserved - Same commands, same behavior - Tests updated to import from cli.doc_scraper **Tests:** - 68/71 passing (95.8%) - Updated imports from doc_scraper to cli.doc_scraper - Fixed validate_config() tuple unpacking (errors, warnings) - 3 minor test failures (checking warnings instead of errors) **Benefits:** - Use as CLI tool: python3 cli/doc_scraper.py - Use via MCP: Integrated with Claude Code - Shared code and configs - Single source of truth 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 15:19:53 +03:00
parent af87572735
commit ae924a9d05
12 changed files with 658 additions and 34 deletions
--- a/mcp/README.md
+++ b/mcp/README.md
@@ -0,0 +1,238 @@
+# Skill Seeker MCP Server
+
+Model Context Protocol (MCP) server for Skill Seeker - Generate Claude AI skills from documentation websites directly from Claude Code.
+
+## What is MCP?
+
+MCP (Model Context Protocol) allows Claude Code to use external tools. This server provides tools for:
+- Generating config files for documentation sites
+- Estimating page counts before scraping
+- Scraping documentation and building skills
+- Packaging skills for upload
+- Managing configurations
+
+## Installation
+
+### 1. Install Dependencies
+
+```bash
+cd mcp
+pip install -r requirements.txt
+```
+
+### 2. Configure Claude Code
+
+Add to your Claude Code MCP settings (`~/.config/claude-code/mcp.json`):
+
+```json
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "python3",
+      "args": [
+        "/path/to/Skill_Seekers/mcp/server.py"
+      ],
+      "cwd": "/path/to/Skill_Seekers"
+    }
+  }
+}
+```
+
+**Replace `/path/to/Skill_Seekers` with your actual repository path!**
+
+### 3. Restart Claude Code
+
+Restart Claude Code to load the MCP server.
+
+## Available Tools
+
+### 1. `generate_config`
+
+Generate a config file for any documentation website.
+
+**Parameters:**
+- `name` (required): Skill name (lowercase, alphanumeric, hyphens, underscores)
+- `url` (required): Base documentation URL (must include http:// or https://)
+- `description` (required): Description of when to use this skill
+- `max_pages` (optional): Maximum pages to scrape (default: 100)
+- `rate_limit` (optional): Delay between requests in seconds (default: 0.5)
+
+**Example:**
+```
+Generate config for Tailwind CSS docs at https://tailwindcss.com/docs
+```
+
+### 2. `estimate_pages`
+
+Estimate how many pages will be scraped from a config.
+
+**Parameters:**
+- `config_path` (required): Path to config JSON file
+- `max_discovery` (optional): Maximum pages to discover (default: 1000)
+
+**Example:**
+```
+Estimate pages for configs/tailwind.json
+```
+
+### 3. `scrape_docs`
+
+Scrape documentation and build Claude skill.
+
+**Parameters:**
+- `config_path` (required): Path to config JSON file
+- `enhance_local` (optional): Open terminal for local enhancement (default: false)
+- `skip_scrape` (optional): Skip scraping, use cached data (default: false)
+- `dry_run` (optional): Preview without saving (default: false)
+
+**Example:**
+```
+Scrape docs using configs/tailwind.json
+```
+
+### 4. `package_skill`
+
+Package a skill directory into a .zip file.
+
+**Parameters:**
+- `skill_dir` (required): Path to skill directory
+
+**Example:**
+```
+Package skill at output/tailwind/
+```
+
+### 5. `list_configs`
+
+List all available preset configurations.
+
+**Example:**
+```
+Show me all available configs
+```
+
+### 6. `validate_config`
+
+Validate a config file for errors.
+
+**Parameters:**
+- `config_path` (required): Path to config JSON file
+
+**Example:**
+```
+Validate configs/tailwind.json
+```
+
+## Usage Workflow
+
+### Quick Start
+
+```
+1. "Generate config for Next.js docs at https://nextjs.org/docs"
+2. "Estimate pages for configs/nextjs.json"
+3. "Scrape docs using configs/nextjs.json"
+4. "Package skill at output/nextjs/"
+5. Upload nextjs.zip to Claude!
+```
+
+### With Enhancement
+
+```
+1. "Generate config for Svelte docs at https://svelte.dev/docs"
+2. "Scrape docs using configs/svelte.json with local enhancement"
+3. (Terminal opens for Claude Code to enhance SKILL.md)
+4. "Package skill at output/svelte/"
+```
+
+### Using Presets
+
+```
+1. "List all available configs"
+2. "Scrape docs using configs/react.json"
+3. "Package skill at output/react/"
+```
+
+## Troubleshooting
+
+### MCP Server Not Loading
+
+1. Check MCP config path: `cat ~/.config/claude-code/mcp.json`
+2. Verify Python path: `which python3`
+3. Test server manually: `python3 mcp/server.py`
+4. Check Claude Code logs
+
+### Tools Not Appearing
+
+1. Restart Claude Code completely
+2. Verify mcp package is installed: `pip show mcp`
+3. Check server.py has execute permissions: `chmod +x mcp/server.py`
+
+### Import Errors
+
+Make sure you're running commands from the repository root:
+```bash
+cd /path/to/Skill_Seekers
+python3 mcp/server.py
+```
+
+## Architecture
+
+```
+Skill_Seekers/
+├── cli/                    # CLI tools (used by MCP)
+│   ├── doc_scraper.py
+│   ├── estimate_pages.py
+│   ├── enhance_skill.py
+│   ├── package_skill.py
+│   └── ...
+├── mcp/                    # MCP server
+│   ├── server.py          # Main MCP server
+│   ├── requirements.txt   # MCP dependencies
+│   └── README.md         # This file
+├── configs/               # Shared configs
+└── output/                # Generated skills
+```
+
+## Development
+
+### Adding New Tools
+
+Edit `mcp/server.py`:
+
+```python
+# 1. Add tool definition to list_tools()
+Tool(
+    name="my_tool",
+    description="Tool description",
+    inputSchema={...}
+)
+
+# 2. Add tool handler to call_tool()
+elif name == "my_tool":
+    return await my_tool_handler(arguments)
+
+# 3. Implement handler
+async def my_tool_handler(args: dict) -> list[TextContent]:
+    # Tool logic here
+    return [TextContent(type="text", text=result)]
+```
+
+### Testing
+
+```bash
+# Test server manually
+python3 mcp/server.py
+
+# Test with MCP inspector (if available)
+mcp-inspector mcp/server.py
+```
+
+## Links
+
+- [Main CLI Documentation](../README.md)
+- [MCP Protocol](https://modelcontextprotocol.io/)
+- [Claude Code](https://claude.ai/code)
+
+## License
+
+Same as parent project (see ../LICENSE)
--- a/mcp/requirements.txt
+++ b/mcp/requirements.txt
@@ -0,0 +1,9 @@
+# MCP Server dependencies
+mcp>=1.0.0
+
+# CLI tool dependencies (shared)
+requests>=2.31.0
+beautifulsoup4>=4.12.0
+
+# Optional: for API-based enhancement
+# anthropic>=0.18.0
--- a/mcp/server.py
+++ b/mcp/server.py
@@ -0,0 +1,377 @@
+#!/usr/bin/env python3
+"""
+Skill Seeker MCP Server
+Model Context Protocol server for generating Claude AI skills from documentation
+"""
+
+import asyncio
+import json
+import subprocess
+import sys
+from pathlib import Path
+from typing import Any
+
+# Add parent directory to path for imports
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+try:
+    from mcp.server import Server
+    from mcp.types import Tool, TextContent
+except ImportError:
+    print("❌ Error: mcp package not installed")
+    print("Install with: pip install mcp")
+    sys.exit(1)
+
+
+# Initialize MCP server
+app = Server("skill-seeker")
+
+# Path to CLI tools
+CLI_DIR = Path(__file__).parent.parent / "cli"
+
+
+@app.list_tools()
+async def list_tools() -> list[Tool]:
+    """List available tools"""
+    return [
+        Tool(
+            name="generate_config",
+            description="Generate a config file for documentation scraping. Interactively creates a JSON config for any documentation website.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "name": {
+                        "type": "string",
+                        "description": "Skill name (lowercase, alphanumeric, hyphens, underscores)",
+                    },
+                    "url": {
+                        "type": "string",
+                        "description": "Base documentation URL (must include http:// or https://)",
+                    },
+                    "description": {
+                        "type": "string",
+                        "description": "Description of when to use this skill",
+                    },
+                    "max_pages": {
+                        "type": "integer",
+                        "description": "Maximum pages to scrape (default: 100)",
+                        "default": 100,
+                    },
+                    "rate_limit": {
+                        "type": "number",
+                        "description": "Delay between requests in seconds (default: 0.5)",
+                        "default": 0.5,
+                    },
+                },
+                "required": ["name", "url", "description"],
+            },
+        ),
+        Tool(
+            name="estimate_pages",
+            description="Estimate how many pages will be scraped from a config. Fast preview without downloading content.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "config_path": {
+                        "type": "string",
+                        "description": "Path to config JSON file (e.g., configs/react.json)",
+                    },
+                    "max_discovery": {
+                        "type": "integer",
+                        "description": "Maximum pages to discover during estimation (default: 1000)",
+                        "default": 1000,
+                    },
+                },
+                "required": ["config_path"],
+            },
+        ),
+        Tool(
+            name="scrape_docs",
+            description="Scrape documentation and build Claude skill. Creates SKILL.md and reference files.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "config_path": {
+                        "type": "string",
+                        "description": "Path to config JSON file (e.g., configs/react.json)",
+                    },
+                    "enhance_local": {
+                        "type": "boolean",
+                        "description": "Open terminal for local enhancement with Claude Code (default: false)",
+                        "default": False,
+                    },
+                    "skip_scrape": {
+                        "type": "boolean",
+                        "description": "Skip scraping, use cached data (default: false)",
+                        "default": False,
+                    },
+                    "dry_run": {
+                        "type": "boolean",
+                        "description": "Preview what will be scraped without saving (default: false)",
+                        "default": False,
+                    },
+                },
+                "required": ["config_path"],
+            },
+        ),
+        Tool(
+            name="package_skill",
+            description="Package a skill directory into a .zip file ready for Claude upload.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "skill_dir": {
+                        "type": "string",
+                        "description": "Path to skill directory (e.g., output/react/)",
+                    },
+                },
+                "required": ["skill_dir"],
+            },
+        ),
+        Tool(
+            name="list_configs",
+            description="List all available preset configurations.",
+            inputSchema={
+                "type": "object",
+                "properties": {},
+            },
+        ),
+        Tool(
+            name="validate_config",
+            description="Validate a config file for errors.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "config_path": {
+                        "type": "string",
+                        "description": "Path to config JSON file",
+                    },
+                },
+                "required": ["config_path"],
+            },
+        ),
+    ]
+
+
+@app.call_tool()
+async def call_tool(name: str, arguments: Any) -> list[TextContent]:
+    """Handle tool calls"""
+
+    try:
+        if name == "generate_config":
+            return await generate_config_tool(arguments)
+        elif name == "estimate_pages":
+            return await estimate_pages_tool(arguments)
+        elif name == "scrape_docs":
+            return await scrape_docs_tool(arguments)
+        elif name == "package_skill":
+            return await package_skill_tool(arguments)
+        elif name == "list_configs":
+            return await list_configs_tool(arguments)
+        elif name == "validate_config":
+            return await validate_config_tool(arguments)
+        else:
+            return [TextContent(type="text", text=f"Unknown tool: {name}")]
+
+    except Exception as e:
+        return [TextContent(type="text", text=f"Error: {str(e)}")]
+
+
+async def generate_config_tool(args: dict) -> list[TextContent]:
+    """Generate a config file"""
+    name = args["name"]
+    url = args["url"]
+    description = args["description"]
+    max_pages = args.get("max_pages", 100)
+    rate_limit = args.get("rate_limit", 0.5)
+
+    # Create config
+    config = {
+        "name": name,
+        "description": description,
+        "base_url": url,
+        "selectors": {
+            "main_content": "article",
+            "title": "h1",
+            "code_blocks": "pre code"
+        },
+        "url_patterns": {
+            "include": [],
+            "exclude": []
+        },
+        "categories": {},
+        "rate_limit": rate_limit,
+        "max_pages": max_pages
+    }
+
+    # Save to configs directory
+    config_path = Path("configs") / f"{name}.json"
+    config_path.parent.mkdir(exist_ok=True)
+
+    with open(config_path, 'w') as f:
+        json.dump(config, f, indent=2)
+
+    result = f"""✅ Config created: {config_path}
+
+Configuration:
+  Name: {name}
+  URL: {url}
+  Max pages: {max_pages}
+  Rate limit: {rate_limit}s
+
+Next steps:
+  1. Review/edit config: cat {config_path}
+  2. Estimate pages: Use estimate_pages tool
+  3. Scrape docs: Use scrape_docs tool
+
+Note: Default selectors may need adjustment for your documentation site.
+"""
+
+    return [TextContent(type="text", text=result)]
+
+
+async def estimate_pages_tool(args: dict) -> list[TextContent]:
+    """Estimate page count"""
+    config_path = args["config_path"]
+    max_discovery = args.get("max_discovery", 1000)
+
+    # Run estimate_pages.py
+    cmd = [
+        sys.executable,
+        str(CLI_DIR / "estimate_pages.py"),
+        config_path,
+        "--max-discovery", str(max_discovery)
+    ]
+
+    result = subprocess.run(cmd, capture_output=True, text=True)
+
+    if result.returncode == 0:
+        return [TextContent(type="text", text=result.stdout)]
+    else:
+        return [TextContent(type="text", text=f"Error: {result.stderr}")]
+
+
+async def scrape_docs_tool(args: dict) -> list[TextContent]:
+    """Scrape documentation"""
+    config_path = args["config_path"]
+    enhance_local = args.get("enhance_local", False)
+    skip_scrape = args.get("skip_scrape", False)
+    dry_run = args.get("dry_run", False)
+
+    # Build command
+    cmd = [
+        sys.executable,
+        str(CLI_DIR / "doc_scraper.py"),
+        "--config", config_path
+    ]
+
+    if enhance_local:
+        cmd.append("--enhance-local")
+    if skip_scrape:
+        cmd.append("--skip-scrape")
+    if dry_run:
+        cmd.append("--dry-run")
+
+    # Run doc_scraper.py
+    result = subprocess.run(cmd, capture_output=True, text=True)
+
+    if result.returncode == 0:
+        return [TextContent(type="text", text=result.stdout)]
+    else:
+        return [TextContent(type="text", text=f"Error: {result.stderr}\n{result.stdout}")]
+
+
+async def package_skill_tool(args: dict) -> list[TextContent]:
+    """Package skill to .zip"""
+    skill_dir = args["skill_dir"]
+
+    # Run package_skill.py
+    cmd = [
+        sys.executable,
+        str(CLI_DIR / "package_skill.py"),
+        skill_dir
+    ]
+
+    result = subprocess.run(cmd, capture_output=True, text=True)
+
+    if result.returncode == 0:
+        return [TextContent(type="text", text=result.stdout)]
+    else:
+        return [TextContent(type="text", text=f"Error: {result.stderr}")]
+
+
+async def list_configs_tool(args: dict) -> list[TextContent]:
+    """List available configs"""
+    configs_dir = Path("configs")
+
+    if not configs_dir.exists():
+        return [TextContent(type="text", text="No configs directory found")]
+
+    configs = list(configs_dir.glob("*.json"))
+
+    if not configs:
+        return [TextContent(type="text", text="No config files found")]
+
+    result = "📋 Available Configs:\n\n"
+
+    for config_file in sorted(configs):
+        try:
+            with open(config_file) as f:
+                config = json.load(f)
+                name = config.get("name", config_file.stem)
+                desc = config.get("description", "No description")
+                url = config.get("base_url", "")
+
+                result += f"  • {config_file.name}\n"
+                result += f"    Name: {name}\n"
+                result += f"    URL: {url}\n"
+                result += f"    Description: {desc}\n\n"
+        except Exception as e:
+            result += f"  • {config_file.name} - Error reading: {e}\n\n"
+
+    return [TextContent(type="text", text=result)]
+
+
+async def validate_config_tool(args: dict) -> list[TextContent]:
+    """Validate a config file"""
+    config_path = args["config_path"]
+
+    # Import validation function
+    sys.path.insert(0, str(CLI_DIR))
+    from doc_scraper import load_config, validate_config
+
+    try:
+        config = load_config(config_path)
+        errors = validate_config(config)
+
+        if errors:
+            result = f"❌ Config validation failed:\n\n"
+            for error in errors:
+                result += f"  • {error}\n"
+        else:
+            result = f"✅ Config is valid!\n\n"
+            result += f"  Name: {config['name']}\n"
+            result += f"  Base URL: {config['base_url']}\n"
+            result += f"  Max pages: {config.get('max_pages', 'Not set')}\n"
+            result += f"  Rate limit: {config.get('rate_limit', 'Not set')}s\n"
+
+        return [TextContent(type="text", text=result)]
+
+    except Exception as e:
+        return [TextContent(type="text", text=f"❌ Error: {str(e)}")]
+
+
+async def main():
+    """Run the MCP server"""
+    from mcp.server.stdio import stdio_server
+
+    async with stdio_server() as (read_stream, write_stream):
+        await app.run(
+            read_stream,
+            write_stream,
+            app.create_initialization_options()
+        )
+
+
+if __name__ == "__main__":
+    asyncio.run(main())