Update documentation to include PDF support

- Add PDF support to README.md Key Features - Add PDF CLI example (Option 3) - Update MCP README from 9 to 10 tools - Add scrape_pdf tool documentation - Add PDF workflow example - Update tool descriptions All main documentation now reflects PDF functionality
2025-10-23 00:33:44 +03:00
parent 6936057820
commit 8ebd736055
2 changed files with 69 additions and 5 deletions
--- a/README.md
+++ b/README.md
@@ -34,6 +34,7 @@ Skill Seeker is an automated tool that transforms any documentation website into
 ## Key Features

 ✅ **Universal Scraper** - Works with ANY documentation website
+✅ **PDF Documentation Support** - Extract text, code, and images from PDF files (**NEW!**)
 ✅ **AI-Powered Enhancement** - Transforms basic templates into comprehensive guides
 ✅ **MCP Server for Claude Code** - Use directly from Claude Code with natural language
 ✅ **Large Documentation Support** - Handle 10K-40K+ page docs with intelligent splitting
@@ -57,11 +58,12 @@ Skill Seeker is an automated tool that transforms any documentation website into

 # Then in Claude Code, just ask:
 "Generate a React skill from https://react.dev/"
+"Scrape PDF at docs/manual.pdf and create skill"
 ```

 **Time:** Automated | **Quality:** Production-ready | **Cost:** Free

-### Option 2: Use CLI Directly
+### Option 2: Use CLI Directly (HTML Docs)

 ```bash
 # Install dependencies (2 pip packages)
@@ -75,6 +77,20 @@ python3 cli/doc_scraper.py --config configs/react.json --enhance-local

 **Time:** ~25 minutes | **Quality:** Production-ready | **Cost:** Free

+### Option 3: Use CLI for PDF Documentation
+
+```bash
+# Install PDF support
+pip3 install PyMuPDF
+
+# Extract and convert PDF to skill
+python3 cli/pdf_scraper.py --pdf docs/manual.pdf --name myskill
+
+# Upload output/myskill.zip to Claude - Done!
+```
+
+**Time:** ~5-15 minutes | **Quality:** Production-ready | **Cost:** Free
+
 ## How It Works

 ```mermaid
--- a/mcp/README.md
+++ b/mcp/README.md
@@ -11,8 +11,9 @@ This MCP server allows Claude Code to use Skill Seeker's tools directly through
 - Scrape documentation and build skills
 - Package skills into `.zip` files
 - List and validate configurations
- **NEW:** Split large documentation (10K-40K+ pages) into focused sub-skills
- **NEW:** Generate intelligent router/hub skills for split documentation
+- Split large documentation (10K-40K+ pages) into focused sub-skills
+- Generate intelligent router/hub skills for split documentation
+- **NEW:** Scrape PDF documentation and extract code/images

 ## Quick Start

@@ -72,7 +73,7 @@ You should see a list of preset configurations (Godot, React, Vue, etc.).

 ## Available Tools

-The MCP server exposes 9 tools:
+The MCP server exposes 10 tools:

 ### 1. `generate_config`
 Create a new configuration file for any documentation website.
@@ -197,6 +198,35 @@ Generate router for configs/godot-*.json
 - Creates router SKILL.md with intelligent routing logic
 - Users can ask questions naturally, router directs to appropriate sub-skill

+### 10. `scrape_pdf`
+Scrape PDF documentation and build Claude skill. Extracts text, code blocks, and images from PDF files.
+
+**Parameters:**
+- `config_path` (optional): Path to PDF config JSON file (e.g., "configs/manual_pdf.json")
+- `pdf_path` (optional): Direct PDF path (alternative to config_path)
+- `name` (optional): Skill name (required with pdf_path)
+- `description` (optional): Skill description
+- `from_json` (optional): Build from extracted JSON file (e.g., "output/manual_extracted.json")
+
+**Examples:**
+```
+Scrape PDF at docs/manual.pdf and create skill named api-docs
+Create skill from configs/example_pdf.json
+Build skill from output/manual_extracted.json
+```
+
+**What it does:**
+- Extracts text and markdown from PDF pages
+- Detects code blocks using 3 methods (font, indent, pattern)
+- Detects programming language with confidence scoring (19+ languages)
+- Validates syntax and scores code quality (0-10 scale)
+- Extracts images with size filtering
+- Detects chapters and creates page chunks
+- Categorizes content automatically
+- Generates complete skill structure (SKILL.md + references)
+
+**See:** `docs/PDF_SCRAPER.md` for complete PDF documentation guide
+
 ## Example Workflows

 ### Generate a New Skill from Scratch
@@ -252,7 +282,25 @@ User: Scrape docs using configs/godot.json
 Claude: [Starts scraping...]
 ```

-### Large Documentation (40K Pages) - NEW
+### PDF Documentation - NEW
+
+```
+User: Scrape PDF at docs/api-manual.pdf and create skill named api-docs
+
+Claude: 📄 Scraping PDF documentation...
+        ✅ Extracted 120 pages
+        ✅ Found 45 code blocks (Python, JavaScript, C++)
+        ✅ Extracted 12 images
+        ✅ Created skill at output/api-docs/
+        📦 Package with: python3 cli/package_skill.py output/api-docs/
+
+User: Package skill at output/api-docs/
+
+Claude: ✅ Created: output/api-docs.zip
+        Ready to upload to Claude!
+```
+
+### Large Documentation (40K Pages)

 ```
 User: Estimate pages for configs/godot.json