Update documentation to include PDF support

- Add PDF support to README.md Key Features
- Add PDF CLI example (Option 3)
- Update MCP README from 9 to 10 tools
- Add scrape_pdf tool documentation
- Add PDF workflow example
- Update tool descriptions

All main documentation now reflects PDF functionality
This commit is contained in:
yusyus
2025-10-23 00:33:44 +03:00
parent 6936057820
commit 8ebd736055
2 changed files with 69 additions and 5 deletions

View File

@@ -34,6 +34,7 @@ Skill Seeker is an automated tool that transforms any documentation website into
## Key Features
**Universal Scraper** - Works with ANY documentation website
**PDF Documentation Support** - Extract text, code, and images from PDF files (**NEW!**)
**AI-Powered Enhancement** - Transforms basic templates into comprehensive guides
**MCP Server for Claude Code** - Use directly from Claude Code with natural language
**Large Documentation Support** - Handle 10K-40K+ page docs with intelligent splitting
@@ -57,11 +58,12 @@ Skill Seeker is an automated tool that transforms any documentation website into
# Then in Claude Code, just ask:
"Generate a React skill from https://react.dev/"
"Scrape PDF at docs/manual.pdf and create skill"
```
**Time:** Automated | **Quality:** Production-ready | **Cost:** Free
### Option 2: Use CLI Directly
### Option 2: Use CLI Directly (HTML Docs)
```bash
# Install dependencies (2 pip packages)
@@ -75,6 +77,20 @@ python3 cli/doc_scraper.py --config configs/react.json --enhance-local
**Time:** ~25 minutes | **Quality:** Production-ready | **Cost:** Free
### Option 3: Use CLI for PDF Documentation
```bash
# Install PDF support
pip3 install PyMuPDF
# Extract and convert PDF to skill
python3 cli/pdf_scraper.py --pdf docs/manual.pdf --name myskill
# Upload output/myskill.zip to Claude - Done!
```
**Time:** ~5-15 minutes | **Quality:** Production-ready | **Cost:** Free
## How It Works
```mermaid

View File

@@ -11,8 +11,9 @@ This MCP server allows Claude Code to use Skill Seeker's tools directly through
- Scrape documentation and build skills
- Package skills into `.zip` files
- List and validate configurations
- **NEW:** Split large documentation (10K-40K+ pages) into focused sub-skills
- **NEW:** Generate intelligent router/hub skills for split documentation
- Split large documentation (10K-40K+ pages) into focused sub-skills
- Generate intelligent router/hub skills for split documentation
- **NEW:** Scrape PDF documentation and extract code/images
## Quick Start
@@ -72,7 +73,7 @@ You should see a list of preset configurations (Godot, React, Vue, etc.).
## Available Tools
The MCP server exposes 9 tools:
The MCP server exposes 10 tools:
### 1. `generate_config`
Create a new configuration file for any documentation website.
@@ -197,6 +198,35 @@ Generate router for configs/godot-*.json
- Creates router SKILL.md with intelligent routing logic
- Users can ask questions naturally, router directs to appropriate sub-skill
### 10. `scrape_pdf`
Scrape PDF documentation and build Claude skill. Extracts text, code blocks, and images from PDF files.
**Parameters:**
- `config_path` (optional): Path to PDF config JSON file (e.g., "configs/manual_pdf.json")
- `pdf_path` (optional): Direct PDF path (alternative to config_path)
- `name` (optional): Skill name (required with pdf_path)
- `description` (optional): Skill description
- `from_json` (optional): Build from extracted JSON file (e.g., "output/manual_extracted.json")
**Examples:**
```
Scrape PDF at docs/manual.pdf and create skill named api-docs
Create skill from configs/example_pdf.json
Build skill from output/manual_extracted.json
```
**What it does:**
- Extracts text and markdown from PDF pages
- Detects code blocks using 3 methods (font, indent, pattern)
- Detects programming language with confidence scoring (19+ languages)
- Validates syntax and scores code quality (0-10 scale)
- Extracts images with size filtering
- Detects chapters and creates page chunks
- Categorizes content automatically
- Generates complete skill structure (SKILL.md + references)
**See:** `docs/PDF_SCRAPER.md` for complete PDF documentation guide
## Example Workflows
### Generate a New Skill from Scratch
@@ -252,7 +282,25 @@ User: Scrape docs using configs/godot.json
Claude: [Starts scraping...]
```
### Large Documentation (40K Pages) - NEW
### PDF Documentation - NEW
```
User: Scrape PDF at docs/api-manual.pdf and create skill named api-docs
Claude: 📄 Scraping PDF documentation...
✅ Extracted 120 pages
✅ Found 45 code blocks (Python, JavaScript, C++)
✅ Extracted 12 images
✅ Created skill at output/api-docs/
📦 Package with: python3 cli/package_skill.py output/api-docs/
User: Package skill at output/api-docs/
Claude: ✅ Created: output/api-docs.zip
Ready to upload to Claude!
```
### Large Documentation (40K Pages)
```
User: Estimate pages for configs/godot.json