Release v1.18.1: Enhance markdown-tools with PDF image extraction

- Add extract_pdf_images.py script using PyMuPDF
- Refactor SKILL.md for clearer workflow documentation
- Update installation to use markitdown[pdf] extra
- Update marketplace version to 1.18.1
- Update markdown-tools version to 1.1.0
- Update README/README.zh-CN with new features
- Update QUICKSTART docs with in-app install instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
daymade
2025-12-28 18:46:15 +08:00
parent 515514b058
commit 8233430cf2
9 changed files with 264 additions and 127 deletions

View File

@@ -6,7 +6,7 @@
},
"metadata": {
"description": "Professional Claude Code skills for GitHub operations, document conversion, diagram generation, statusline customization, Teams communication, repomix utilities, skill creation, CLI demo generation, LLM icon access, Cloudflare troubleshooting, UI design system extraction, professional presentation creation, YouTube video downloading, secure repomix packaging, ASR transcription correction, video comparison quality analysis, comprehensive QA testing infrastructure, prompt optimization with EARS methodology, session history recovery, documentation cleanup, PDF generation with Chinese font support, CLAUDE.md progressive disclosure optimization, CCPM skill registry search and management, Promptfoo LLM evaluation framework, and iOS app development with XcodeGen and SwiftUI",
"version": "1.18.0",
"version": "1.18.1",
"homepage": "https://github.com/daymade/claude-code-skills"
},
"plugins": [
@@ -32,10 +32,10 @@
},
{
"name": "markdown-tools",
"description": "Convert documents (PDFs, Word, PowerPoint, Confluence exports) to markdown with Windows/WSL path handling support",
"description": "Convert documents (PDFs, Word, PowerPoint, Confluence exports) to markdown with Windows/WSL path handling and PDF image extraction support",
"source": "./",
"strict": false,
"version": "1.0.0",
"version": "1.1.0",
"category": "document-conversion",
"keywords": ["markdown", "pdf", "docx", "confluence", "markitdown", "wsl"],
"skills": ["./markdown-tools"]

View File

@@ -25,6 +25,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Security
- None
## [1.18.1] - 2025-12-28
### Changed
- **markdown-tools**: Enhanced with PDF image extraction capability
- Added `extract_pdf_images.py` script using PyMuPDF
- Refactored SKILL.md for clearer workflow documentation
- Updated installation instructions to use `markitdown[pdf]` extra
- Updated marketplace version from 1.18.0 to 1.18.1
## [1.18.0] - 2025-12-20
### Added

View File

@@ -32,6 +32,18 @@ Skills use a three-level loading system:
### Installation Scripts
**In Claude Code (in-app):**
```text
/plugin marketplace add daymade/claude-code-skills
```
Then:
1. Select **Browse and install plugins**
2. Select **daymade/claude-code-skills**
3. Select **skill-creator**
4. Select **Install now**
**From your terminal (CLI):**
```bash
# Automated installation (macOS/Linux)
curl -fsSL https://raw.githubusercontent.com/daymade/claude-code-skills/main/scripts/install.sh | bash
@@ -73,6 +85,8 @@ cp -r skill-name ~/.claude/skills/
# Then restart Claude Code
```
In Claude Code, use `/plugin ...` slash commands. In your terminal, use `claude plugin ...`.
### Git Operations
This repository uses standard git workflow:

View File

@@ -8,6 +8,18 @@ Get started with Claude Code Skills Marketplace in less than 2 minutes!
### Step 1: Install skill-creator
**In Claude Code (in-app):**
```text
/plugin marketplace add daymade/claude-code-skills
```
Then:
1. Select **Browse and install plugins**
2. Select **daymade/claude-code-skills**
3. Select **skill-creator**
4. Select **Install now**
**From your terminal (CLI):**
```bash
# Add the marketplace
claude plugin marketplace add https://github.com/daymade/claude-code-skills
@@ -107,7 +119,7 @@ claude plugin marketplace add https://github.com/daymade/claude-code-skills
# Marketplace name: daymade-skills (from marketplace.json)
# Use @daymade-skills in install commands (e.g., skill-name@daymade-skills)
# Do not use /plugin; all commands are `claude plugin ...`
# In Claude Code use `/plugin ...`; in your terminal use `claude plugin ...`
# Step 2: Install skills you need
claude plugin install github-ops@daymade-skills
claude plugin install markdown-tools@daymade-skills

View File

@@ -8,6 +8,18 @@
### 步骤 1安装 skill-creator
**在 Claude Code 内(应用内):**
```text
/plugin marketplace add daymade/claude-code-skills
```
然后:
1. 选择 **Browse and install plugins**
2. 选择 **daymade/claude-code-skills**
3. 选择 **skill-creator**
4. 选择 **Install now**
**在终端CLI**
```bash
# 添加市场
claude plugin marketplace add https://github.com/daymade/claude-code-skills
@@ -107,7 +119,7 @@ claude plugin marketplace add https://github.com/daymade/claude-code-skills
# Marketplace 名称daymade-skills来自 marketplace.json
# 安装命令请使用 @daymade-skills例如 skill-name@daymade-skills
# 所有命令都应使用 `claude plugin ...`(没有 `/plugin` 命令)
# 在 Claude Code 内使用 `/plugin ...`,在终端中使用 `claude plugin ...`
# 步骤 2安装你需要的技能
claude plugin install github-ops@daymade-skills
claude plugin install markdown-tools@daymade-skills

View File

@@ -48,6 +48,18 @@ The `skill-creator` is the **meta-skill** that enables you to build, validate, a
### Quick Install
**In Claude Code (in-app):**
```text
/plugin marketplace add daymade/claude-code-skills
```
Then:
1. Select **Browse and install plugins**
2. Select **daymade/claude-code-skills**
3. Select **skill-creator**
4. Select **Install now**
**From your terminal (CLI):**
```bash
claude plugin marketplace add https://github.com/daymade/claude-code-skills
# Marketplace name: daymade-skills (from marketplace.json)
@@ -88,6 +100,18 @@ Claude Code, with skill-creator loaded, will guide you through the entire skill
## 🚀 Quick Installation
### Install Inside Claude Code (In-App)
```text
/plugin marketplace add daymade/claude-code-skills
```
Then:
1. Select **Browse and install plugins**
2. Select **daymade/claude-code-skills**
3. Select the plugin you want
4. Select **Install now**
### Automated Installation (Recommended)
**macOS/Linux:**
@@ -109,7 +133,7 @@ claude plugin marketplace add https://github.com/daymade/claude-code-skills
Marketplace name is `daymade-skills` (from marketplace.json). Use `@daymade-skills` when installing plugins.
Do not use the repo path as a marketplace name (e.g. `@daymade/claude-code-skills` will fail).
All plugin commands should use `claude plugin ...` (there is no `/plugin` command).
In Claude Code, use `/plugin ...` slash commands. In your terminal, use `claude plugin ...`.
**Essential Skill** (recommended first install):
```bash
@@ -242,20 +266,20 @@ Comprehensive GitHub operations using gh CLI and GitHub API.
### 2. **markdown-tools** - Document Conversion Suite
Converts documents to markdown with Windows/WSL path handling and Obsidian integration.
Converts documents to markdown with Windows/WSL path handling and PDF image extraction.
**When to use:**
- Converting .doc/.docx/PDF/PPTX to markdown
- Extracting images from PDF files
- Processing Confluence exports
- Handling Windows/WSL path conversions
- Working with markitdown utility
**Key features:**
- Multi-format document conversion
- Confluence export processing
- PDF image extraction using PyMuPDF
- Windows/WSL path automation
- Obsidian vault integration
- Helper scripts for path conversion
- Confluence export processing
- Helper scripts for path conversion and image extraction
**🎬 Live Demo**

View File

@@ -48,6 +48,18 @@
### 快速安装
**在 Claude Code 内(应用内):**
```text
/plugin marketplace add daymade/claude-code-skills
```
然后:
1. 选择 **Browse and install plugins**
2. 选择 **daymade/claude-code-skills**
3. 选择 **skill-creator**
4. 选择 **Install now**
**在终端CLI**
```bash
claude plugin marketplace add https://github.com/daymade/claude-code-skills
# Marketplace 名称daymade-skills来自 marketplace.json
@@ -88,6 +100,18 @@ claude plugin install skill-creator@daymade-skills
## 🚀 快速安装
### 在 Claude Code 内安装(应用内)
```text
/plugin marketplace add daymade/claude-code-skills
```
然后:
1. 选择 **Browse and install plugins**
2. 选择 **daymade/claude-code-skills**
3. 选择你需要的插件
4. 选择 **Install now**
### 自动化安装(推荐)
**macOS/Linux**
@@ -109,7 +133,7 @@ claude plugin marketplace add https://github.com/daymade/claude-code-skills
Marketplace 名称是 `daymade-skills`(来自 marketplace.json安装插件时请使用 `@daymade-skills`
不要把仓库路径当成 marketplace 名称(例如 `@daymade/claude-code-skills` 会失败)。
所有插件命令都应使用 `claude plugin ...`(没有 `/plugin` 命令)
在 Claude Code 内使用 `/plugin ...` 斜杠命令,在终端中使用 `claude plugin ...`
**必备技能**(推荐首先安装):
```bash
@@ -264,20 +288,20 @@ CC-Switch 支持以下中国 AI 服务提供商:
### 2. **markdown-tools** - 文档转换套件
将文档转换为 markdown支持 Windows/WSL 路径处理和 Obsidian 集成
将文档转换为 markdown支持 Windows/WSL 路径处理和 PDF 图片提取
**使用场景:**
- 转换 .doc/.docx/PDF/PPTX 为 markdown
- 从 PDF 文件中提取图片
- 处理 Confluence 导出
- 处理 Windows/WSL 路径转换
- 使用 markitdown 工具
**主要功能:**
- 多格式文档转换
- Confluence 导出处理
- PDF 图片提取(使用 PyMuPDF
- Windows/WSL 路径自动化
- Obsidian vault 集成
- 路径转换辅助脚本
- Confluence 导出处理
- 路径转换和图片提取辅助脚本
**🎬 实时演示**

View File

@@ -1,146 +1,93 @@
---
name: markdown-tools
description: Converts documents to markdown (PDFs, Word docs, PowerPoint, Confluence exports) with Windows/WSL path handling. Activates when converting .doc/.docx/PDF/PPTX files to markdown, processing Confluence exports, handling Windows/WSL path conversions, or working with markitdown utility.
description: Converts documents to markdown (PDFs, Word docs, PowerPoint, Confluence exports) with Windows/WSL path handling. Activates when converting .doc/.docx/PDF/PPTX files to markdown, processing Confluence exports, handling Windows/WSL path conversions, extracting images from PDFs, or working with markitdown utility.
---
# Markdown Tools
## Overview
This skill provides document conversion to markdown with Windows/WSL path handling support. It helps convert various document formats to markdown and handles path conversions between Windows and WSL environments.
## Core Capabilities
### 1. Markdown Conversion
Convert documents to markdown format with automatic Windows/WSL path handling.
### 2. Confluence Export Processing
Handle Confluence .doc exports with special characters for knowledge base integration.
Convert documents to markdown with image extraction and Windows/WSL path handling.
## Quick Start
### Convert Any Document to Markdown
### Install markitdown with PDF Support
```bash
# Basic conversion
markitdown "path/to/document.pdf" > output.md
# IMPORTANT: Use [pdf] extra for PDF support
uv tool install "markitdown[pdf]"
# WSL path example
markitdown "/mnt/c/Users/username/Documents/file.docx" > output.md
# Or via pip
pip install "markitdown[pdf]"
```
See `references/conversion-examples.md` for detailed examples of various conversion scenarios.
### Convert Confluence Export
### Basic Conversion
```bash
# Direct conversion for simple exports
markitdown "confluence-export.doc" > output.md
# For exports with special characters, see references/
markitdown "document.pdf" -o output.md
# Or redirect: markitdown "document.pdf" > output.md
```
## Path Conversion
## PDF Conversion with Images
### Windows to WSL Path Format
markitdown extracts text only. For PDFs with images, use this workflow:
Windows paths must be converted to WSL format before use in bash commands.
**Conversion rules:**
- Replace `C:\` with `/mnt/c/`
- Replace `\` with `/`
- Preserve spaces and special characters
- Use quotes for paths with spaces
**Example conversions:**
```bash
# Windows path
C:\Users\username\Documents\file.doc
# WSL path
/mnt/c/Users/username/Documents/file.doc
```
**Helper script:** Use `scripts/convert_path.py` to automate conversion:
### Step 1: Convert Text
```bash
python scripts/convert_path.py "C:\Users\username\Downloads\document.doc"
markitdown "document.pdf" -o output.md
```
See `references/conversion-examples.md` for detailed path conversion examples.
### Step 2: Extract Images
## Document Conversion Workflows
### Workflow 1: Simple Markdown Conversion
For straightforward document conversions (PDF, .docx without special characters):
1. Convert Windows path to WSL format (if needed)
2. Run markitdown
3. Redirect output to .md file
See `references/conversion-examples.md` for detailed examples.
### Workflow 2: Confluence Export with Special Characters
For Confluence .doc exports that contain special characters or complex formatting:
1. Save .doc file to accessible location
2. Use appropriate conversion method (see references)
3. Verify output formatting
See `references/conversion-examples.md` for step-by-step command examples.
## Error Handling
### Common Issues and Solutions
**markitdown not found:**
```bash
# Install markitdown via pip
pip install markitdown
# Create assets directory alongside the markdown
mkdir -p assets
# Or via uv tools
uv tool install markitdown
# Extract images using PyMuPDF
uv run --with pymupdf python scripts/extract_pdf_images.py "document.pdf" ./assets
```
**Path not found:**
### Step 3: Add Image References
Insert image references in the markdown where needed:
```markdown
![Description](assets/img_page1_1.png)
```
### Step 4: Format Cleanup
markitdown output often needs manual fixes:
- Add proper heading levels (`#`, `##`, `###`)
- Reconstruct tables in markdown format
- Fix broken line breaks
- Restore indentation structure
## Path Conversion (Windows/WSL)
```bash
# Verify path exists
ls -la "/mnt/c/Users/username/Documents/file.doc"
# Windows → WSL conversion
C:\Users\name\file.pdf → /mnt/c/Users/name/file.pdf
# Use convert_path.py helper
python scripts/convert_path.py "C:\Users\username\Documents\file.doc"
# Use helper script
python scripts/convert_path.py "C:\Users\name\Documents\file.pdf"
```
**Encoding issues:**
- Ensure files are UTF-8 encoded
- Check for special characters in filenames
- Use quotes around paths with spaces
## Common Issues
**"dependencies needed to read .pdf files"**
```bash
# Install with PDF support
uv tool install "markitdown[pdf]" --force
```
**FontBBox warnings during PDF conversion**
- These are harmless font parsing warnings, output is still correct
**Images missing from output**
- Use `scripts/extract_pdf_images.py` to extract images separately
## Resources
### references/conversion-examples.md
Comprehensive examples for all conversion scenarios including:
- Simple document conversions (PDF, Word, PowerPoint)
- Confluence export handling
- Path conversion examples for Windows/WSL
- Batch conversion operations
- Error recovery and troubleshooting examples
Load this reference when users need specific command examples or encounter conversion issues.
### scripts/convert_path.py
Python script to automate Windows to WSL path conversion. Handles:
- Drive letter conversion (C:\ → /mnt/c/)
- Backslash to forward slash
- Special characters and spaces
## Best Practices
1. **Convert Windows paths to WSL format** before bash operations
2. **Verify paths exist** before operations using ls or test commands
3. **Check output quality** after conversion
4. **Use markitdown directly** for simple conversions
5. **Test incrementally** - Verify each conversion step before proceeding
6. **Preserve directory structure** when doing batch conversions
- `scripts/extract_pdf_images.py` - Extract images from PDF using PyMuPDF
- `scripts/convert_path.py` - Windows to WSL path converter
- `references/conversion-examples.md` - Detailed examples for batch operations

View File

@@ -0,0 +1,95 @@
#!/usr/bin/env python3
"""
Extract images from PDF files using PyMuPDF.
Usage:
uv run --with pymupdf python extract_pdf_images.py <pdf_path> [output_dir]
Examples:
uv run --with pymupdf python extract_pdf_images.py document.pdf
uv run --with pymupdf python extract_pdf_images.py document.pdf ./assets
Output:
Images are saved to output_dir (default: ./assets) with names like:
- img_page1_1.png
- img_page2_1.png
"""
import sys
import os
def extract_images(pdf_path: str, output_dir: str = "assets") -> list[str]:
"""
Extract all images from a PDF file.
Args:
pdf_path: Path to the PDF file
output_dir: Directory to save extracted images
Returns:
List of extracted image file paths
"""
try:
import fitz # PyMuPDF
except ImportError:
print("Error: PyMuPDF not installed. Run with:")
print(' uv run --with pymupdf python extract_pdf_images.py <pdf_path>')
sys.exit(1)
os.makedirs(output_dir, exist_ok=True)
doc = fitz.open(pdf_path)
extracted_files = []
for page_num in range(len(doc)):
page = doc[page_num]
image_list = page.get_images()
for img_index, img in enumerate(image_list):
xref = img[0]
base_image = doc.extract_image(xref)
image_bytes = base_image["image"]
image_ext = base_image["ext"]
# Create descriptive filename
img_filename = f"img_page{page_num + 1}_{img_index + 1}.{image_ext}"
img_path = os.path.join(output_dir, img_filename)
with open(img_path, "wb") as f:
f.write(image_bytes)
extracted_files.append(img_path)
print(f"Extracted: {img_filename} ({len(image_bytes):,} bytes)")
doc.close()
print(f"\nTotal: {len(extracted_files)} images extracted to {output_dir}/")
return extracted_files
def main():
if len(sys.argv) < 2 or sys.argv[1] in ("-h", "--help"):
print("Extract images from PDF files using PyMuPDF.")
print()
print("Usage: python extract_pdf_images.py <pdf_path> [output_dir]")
print()
print("Arguments:")
print(" pdf_path Path to the PDF file")
print(" output_dir Directory to save images (default: ./assets)")
print()
print("Example:")
print(" uv run --with pymupdf python extract_pdf_images.py document.pdf ./assets")
sys.exit(0 if "--help" in sys.argv or "-h" in sys.argv else 1)
pdf_path = sys.argv[1]
output_dir = sys.argv[2] if len(sys.argv) > 2 else "assets"
if not os.path.exists(pdf_path):
print(f"Error: File not found: {pdf_path}")
sys.exit(1)
extract_images(pdf_path, output_dir)
if __name__ == "__main__":
main()