- Add extract_pdf_images.py script using PyMuPDF - Refactor SKILL.md for clearer workflow documentation - Update installation to use markitdown[pdf] extra - Update marketplace version to 1.18.1 - Update markdown-tools version to 1.1.0 - Update README/README.zh-CN with new features - Update QUICKSTART docs with in-app install instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
94 lines
2.3 KiB
Markdown
94 lines
2.3 KiB
Markdown
---
|
|
name: markdown-tools
|
|
description: Converts documents to markdown (PDFs, Word docs, PowerPoint, Confluence exports) with Windows/WSL path handling. Activates when converting .doc/.docx/PDF/PPTX files to markdown, processing Confluence exports, handling Windows/WSL path conversions, extracting images from PDFs, or working with markitdown utility.
|
|
---
|
|
|
|
# Markdown Tools
|
|
|
|
Convert documents to markdown with image extraction and Windows/WSL path handling.
|
|
|
|
## Quick Start
|
|
|
|
### Install markitdown with PDF Support
|
|
|
|
```bash
|
|
# IMPORTANT: Use [pdf] extra for PDF support
|
|
uv tool install "markitdown[pdf]"
|
|
|
|
# Or via pip
|
|
pip install "markitdown[pdf]"
|
|
```
|
|
|
|
### Basic Conversion
|
|
|
|
```bash
|
|
markitdown "document.pdf" -o output.md
|
|
# Or redirect: markitdown "document.pdf" > output.md
|
|
```
|
|
|
|
## PDF Conversion with Images
|
|
|
|
markitdown extracts text only. For PDFs with images, use this workflow:
|
|
|
|
### Step 1: Convert Text
|
|
|
|
```bash
|
|
markitdown "document.pdf" -o output.md
|
|
```
|
|
|
|
### Step 2: Extract Images
|
|
|
|
```bash
|
|
# Create assets directory alongside the markdown
|
|
mkdir -p assets
|
|
|
|
# Extract images using PyMuPDF
|
|
uv run --with pymupdf python scripts/extract_pdf_images.py "document.pdf" ./assets
|
|
```
|
|
|
|
### Step 3: Add Image References
|
|
|
|
Insert image references in the markdown where needed:
|
|
|
|
```markdown
|
|

|
|
```
|
|
|
|
### Step 4: Format Cleanup
|
|
|
|
markitdown output often needs manual fixes:
|
|
- Add proper heading levels (`#`, `##`, `###`)
|
|
- Reconstruct tables in markdown format
|
|
- Fix broken line breaks
|
|
- Restore indentation structure
|
|
|
|
## Path Conversion (Windows/WSL)
|
|
|
|
```bash
|
|
# Windows → WSL conversion
|
|
C:\Users\name\file.pdf → /mnt/c/Users/name/file.pdf
|
|
|
|
# Use helper script
|
|
python scripts/convert_path.py "C:\Users\name\Documents\file.pdf"
|
|
```
|
|
|
|
## Common Issues
|
|
|
|
**"dependencies needed to read .pdf files"**
|
|
```bash
|
|
# Install with PDF support
|
|
uv tool install "markitdown[pdf]" --force
|
|
```
|
|
|
|
**FontBBox warnings during PDF conversion**
|
|
- These are harmless font parsing warnings, output is still correct
|
|
|
|
**Images missing from output**
|
|
- Use `scripts/extract_pdf_images.py` to extract images separately
|
|
|
|
## Resources
|
|
|
|
- `scripts/extract_pdf_images.py` - Extract images from PDF using PyMuPDF
|
|
- `scripts/convert_path.py` - Windows to WSL path converter
|
|
- `references/conversion-examples.md` - Detailed examples for batch operations
|