Update documentation to include PDF support
- Add PDF support to README.md Key Features - Add PDF CLI example (Option 3) - Update MCP README from 9 to 10 tools - Add scrape_pdf tool documentation - Add PDF workflow example - Update tool descriptions All main documentation now reflects PDF functionality
This commit is contained in:
18
README.md
18
README.md
@@ -34,6 +34,7 @@ Skill Seeker is an automated tool that transforms any documentation website into
|
||||
## Key Features
|
||||
|
||||
✅ **Universal Scraper** - Works with ANY documentation website
|
||||
✅ **PDF Documentation Support** - Extract text, code, and images from PDF files (**NEW!**)
|
||||
✅ **AI-Powered Enhancement** - Transforms basic templates into comprehensive guides
|
||||
✅ **MCP Server for Claude Code** - Use directly from Claude Code with natural language
|
||||
✅ **Large Documentation Support** - Handle 10K-40K+ page docs with intelligent splitting
|
||||
@@ -57,11 +58,12 @@ Skill Seeker is an automated tool that transforms any documentation website into
|
||||
|
||||
# Then in Claude Code, just ask:
|
||||
"Generate a React skill from https://react.dev/"
|
||||
"Scrape PDF at docs/manual.pdf and create skill"
|
||||
```
|
||||
|
||||
**Time:** Automated | **Quality:** Production-ready | **Cost:** Free
|
||||
|
||||
### Option 2: Use CLI Directly
|
||||
### Option 2: Use CLI Directly (HTML Docs)
|
||||
|
||||
```bash
|
||||
# Install dependencies (2 pip packages)
|
||||
@@ -75,6 +77,20 @@ python3 cli/doc_scraper.py --config configs/react.json --enhance-local
|
||||
|
||||
**Time:** ~25 minutes | **Quality:** Production-ready | **Cost:** Free
|
||||
|
||||
### Option 3: Use CLI for PDF Documentation
|
||||
|
||||
```bash
|
||||
# Install PDF support
|
||||
pip3 install PyMuPDF
|
||||
|
||||
# Extract and convert PDF to skill
|
||||
python3 cli/pdf_scraper.py --pdf docs/manual.pdf --name myskill
|
||||
|
||||
# Upload output/myskill.zip to Claude - Done!
|
||||
```
|
||||
|
||||
**Time:** ~5-15 minutes | **Quality:** Production-ready | **Cost:** Free
|
||||
|
||||
## How It Works
|
||||
|
||||
```mermaid
|
||||
|
||||
@@ -11,8 +11,9 @@ This MCP server allows Claude Code to use Skill Seeker's tools directly through
|
||||
- Scrape documentation and build skills
|
||||
- Package skills into `.zip` files
|
||||
- List and validate configurations
|
||||
- **NEW:** Split large documentation (10K-40K+ pages) into focused sub-skills
|
||||
- **NEW:** Generate intelligent router/hub skills for split documentation
|
||||
- Split large documentation (10K-40K+ pages) into focused sub-skills
|
||||
- Generate intelligent router/hub skills for split documentation
|
||||
- **NEW:** Scrape PDF documentation and extract code/images
|
||||
|
||||
## Quick Start
|
||||
|
||||
@@ -72,7 +73,7 @@ You should see a list of preset configurations (Godot, React, Vue, etc.).
|
||||
|
||||
## Available Tools
|
||||
|
||||
The MCP server exposes 9 tools:
|
||||
The MCP server exposes 10 tools:
|
||||
|
||||
### 1. `generate_config`
|
||||
Create a new configuration file for any documentation website.
|
||||
@@ -197,6 +198,35 @@ Generate router for configs/godot-*.json
|
||||
- Creates router SKILL.md with intelligent routing logic
|
||||
- Users can ask questions naturally, router directs to appropriate sub-skill
|
||||
|
||||
### 10. `scrape_pdf`
|
||||
Scrape PDF documentation and build Claude skill. Extracts text, code blocks, and images from PDF files.
|
||||
|
||||
**Parameters:**
|
||||
- `config_path` (optional): Path to PDF config JSON file (e.g., "configs/manual_pdf.json")
|
||||
- `pdf_path` (optional): Direct PDF path (alternative to config_path)
|
||||
- `name` (optional): Skill name (required with pdf_path)
|
||||
- `description` (optional): Skill description
|
||||
- `from_json` (optional): Build from extracted JSON file (e.g., "output/manual_extracted.json")
|
||||
|
||||
**Examples:**
|
||||
```
|
||||
Scrape PDF at docs/manual.pdf and create skill named api-docs
|
||||
Create skill from configs/example_pdf.json
|
||||
Build skill from output/manual_extracted.json
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
- Extracts text and markdown from PDF pages
|
||||
- Detects code blocks using 3 methods (font, indent, pattern)
|
||||
- Detects programming language with confidence scoring (19+ languages)
|
||||
- Validates syntax and scores code quality (0-10 scale)
|
||||
- Extracts images with size filtering
|
||||
- Detects chapters and creates page chunks
|
||||
- Categorizes content automatically
|
||||
- Generates complete skill structure (SKILL.md + references)
|
||||
|
||||
**See:** `docs/PDF_SCRAPER.md` for complete PDF documentation guide
|
||||
|
||||
## Example Workflows
|
||||
|
||||
### Generate a New Skill from Scratch
|
||||
@@ -252,7 +282,25 @@ User: Scrape docs using configs/godot.json
|
||||
Claude: [Starts scraping...]
|
||||
```
|
||||
|
||||
### Large Documentation (40K Pages) - NEW
|
||||
### PDF Documentation - NEW
|
||||
|
||||
```
|
||||
User: Scrape PDF at docs/api-manual.pdf and create skill named api-docs
|
||||
|
||||
Claude: 📄 Scraping PDF documentation...
|
||||
✅ Extracted 120 pages
|
||||
✅ Found 45 code blocks (Python, JavaScript, C++)
|
||||
✅ Extracted 12 images
|
||||
✅ Created skill at output/api-docs/
|
||||
📦 Package with: python3 cli/package_skill.py output/api-docs/
|
||||
|
||||
User: Package skill at output/api-docs/
|
||||
|
||||
Claude: ✅ Created: output/api-docs.zip
|
||||
Ready to upload to Claude!
|
||||
```
|
||||
|
||||
### Large Documentation (40K Pages)
|
||||
|
||||
```
|
||||
User: Estimate pages for configs/godot.json
|
||||
|
||||
Reference in New Issue
Block a user