Release v1.18.1: Enhance markdown-tools with PDF image extraction
- Add extract_pdf_images.py script using PyMuPDF - Refactor SKILL.md for clearer workflow documentation - Update installation to use markitdown[pdf] extra - Update marketplace version to 1.18.1 - Update markdown-tools version to 1.1.0 - Update README/README.zh-CN with new features - Update QUICKSTART docs with in-app install instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -1,146 +1,93 @@
|
||||
---
|
||||
name: markdown-tools
|
||||
description: Converts documents to markdown (PDFs, Word docs, PowerPoint, Confluence exports) with Windows/WSL path handling. Activates when converting .doc/.docx/PDF/PPTX files to markdown, processing Confluence exports, handling Windows/WSL path conversions, or working with markitdown utility.
|
||||
description: Converts documents to markdown (PDFs, Word docs, PowerPoint, Confluence exports) with Windows/WSL path handling. Activates when converting .doc/.docx/PDF/PPTX files to markdown, processing Confluence exports, handling Windows/WSL path conversions, extracting images from PDFs, or working with markitdown utility.
|
||||
---
|
||||
|
||||
# Markdown Tools
|
||||
|
||||
## Overview
|
||||
|
||||
This skill provides document conversion to markdown with Windows/WSL path handling support. It helps convert various document formats to markdown and handles path conversions between Windows and WSL environments.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
### 1. Markdown Conversion
|
||||
Convert documents to markdown format with automatic Windows/WSL path handling.
|
||||
|
||||
### 2. Confluence Export Processing
|
||||
Handle Confluence .doc exports with special characters for knowledge base integration.
|
||||
Convert documents to markdown with image extraction and Windows/WSL path handling.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Convert Any Document to Markdown
|
||||
### Install markitdown with PDF Support
|
||||
|
||||
```bash
|
||||
# Basic conversion
|
||||
markitdown "path/to/document.pdf" > output.md
|
||||
# IMPORTANT: Use [pdf] extra for PDF support
|
||||
uv tool install "markitdown[pdf]"
|
||||
|
||||
# WSL path example
|
||||
markitdown "/mnt/c/Users/username/Documents/file.docx" > output.md
|
||||
# Or via pip
|
||||
pip install "markitdown[pdf]"
|
||||
```
|
||||
|
||||
See `references/conversion-examples.md` for detailed examples of various conversion scenarios.
|
||||
|
||||
### Convert Confluence Export
|
||||
### Basic Conversion
|
||||
|
||||
```bash
|
||||
# Direct conversion for simple exports
|
||||
markitdown "confluence-export.doc" > output.md
|
||||
|
||||
# For exports with special characters, see references/
|
||||
markitdown "document.pdf" -o output.md
|
||||
# Or redirect: markitdown "document.pdf" > output.md
|
||||
```
|
||||
|
||||
## Path Conversion
|
||||
## PDF Conversion with Images
|
||||
|
||||
### Windows to WSL Path Format
|
||||
markitdown extracts text only. For PDFs with images, use this workflow:
|
||||
|
||||
Windows paths must be converted to WSL format before use in bash commands.
|
||||
|
||||
**Conversion rules:**
|
||||
- Replace `C:\` with `/mnt/c/`
|
||||
- Replace `\` with `/`
|
||||
- Preserve spaces and special characters
|
||||
- Use quotes for paths with spaces
|
||||
|
||||
**Example conversions:**
|
||||
```bash
|
||||
# Windows path
|
||||
C:\Users\username\Documents\file.doc
|
||||
|
||||
# WSL path
|
||||
/mnt/c/Users/username/Documents/file.doc
|
||||
```
|
||||
|
||||
**Helper script:** Use `scripts/convert_path.py` to automate conversion:
|
||||
### Step 1: Convert Text
|
||||
|
||||
```bash
|
||||
python scripts/convert_path.py "C:\Users\username\Downloads\document.doc"
|
||||
markitdown "document.pdf" -o output.md
|
||||
```
|
||||
|
||||
See `references/conversion-examples.md` for detailed path conversion examples.
|
||||
### Step 2: Extract Images
|
||||
|
||||
## Document Conversion Workflows
|
||||
|
||||
### Workflow 1: Simple Markdown Conversion
|
||||
|
||||
For straightforward document conversions (PDF, .docx without special characters):
|
||||
|
||||
1. Convert Windows path to WSL format (if needed)
|
||||
2. Run markitdown
|
||||
3. Redirect output to .md file
|
||||
|
||||
See `references/conversion-examples.md` for detailed examples.
|
||||
|
||||
### Workflow 2: Confluence Export with Special Characters
|
||||
|
||||
For Confluence .doc exports that contain special characters or complex formatting:
|
||||
|
||||
1. Save .doc file to accessible location
|
||||
2. Use appropriate conversion method (see references)
|
||||
3. Verify output formatting
|
||||
|
||||
See `references/conversion-examples.md` for step-by-step command examples.
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Common Issues and Solutions
|
||||
|
||||
**markitdown not found:**
|
||||
```bash
|
||||
# Install markitdown via pip
|
||||
pip install markitdown
|
||||
# Create assets directory alongside the markdown
|
||||
mkdir -p assets
|
||||
|
||||
# Or via uv tools
|
||||
uv tool install markitdown
|
||||
# Extract images using PyMuPDF
|
||||
uv run --with pymupdf python scripts/extract_pdf_images.py "document.pdf" ./assets
|
||||
```
|
||||
|
||||
**Path not found:**
|
||||
### Step 3: Add Image References
|
||||
|
||||
Insert image references in the markdown where needed:
|
||||
|
||||
```markdown
|
||||

|
||||
```
|
||||
|
||||
### Step 4: Format Cleanup
|
||||
|
||||
markitdown output often needs manual fixes:
|
||||
- Add proper heading levels (`#`, `##`, `###`)
|
||||
- Reconstruct tables in markdown format
|
||||
- Fix broken line breaks
|
||||
- Restore indentation structure
|
||||
|
||||
## Path Conversion (Windows/WSL)
|
||||
|
||||
```bash
|
||||
# Verify path exists
|
||||
ls -la "/mnt/c/Users/username/Documents/file.doc"
|
||||
# Windows → WSL conversion
|
||||
C:\Users\name\file.pdf → /mnt/c/Users/name/file.pdf
|
||||
|
||||
# Use convert_path.py helper
|
||||
python scripts/convert_path.py "C:\Users\username\Documents\file.doc"
|
||||
# Use helper script
|
||||
python scripts/convert_path.py "C:\Users\name\Documents\file.pdf"
|
||||
```
|
||||
|
||||
**Encoding issues:**
|
||||
- Ensure files are UTF-8 encoded
|
||||
- Check for special characters in filenames
|
||||
- Use quotes around paths with spaces
|
||||
## Common Issues
|
||||
|
||||
**"dependencies needed to read .pdf files"**
|
||||
```bash
|
||||
# Install with PDF support
|
||||
uv tool install "markitdown[pdf]" --force
|
||||
```
|
||||
|
||||
**FontBBox warnings during PDF conversion**
|
||||
- These are harmless font parsing warnings, output is still correct
|
||||
|
||||
**Images missing from output**
|
||||
- Use `scripts/extract_pdf_images.py` to extract images separately
|
||||
|
||||
## Resources
|
||||
|
||||
### references/conversion-examples.md
|
||||
Comprehensive examples for all conversion scenarios including:
|
||||
- Simple document conversions (PDF, Word, PowerPoint)
|
||||
- Confluence export handling
|
||||
- Path conversion examples for Windows/WSL
|
||||
- Batch conversion operations
|
||||
- Error recovery and troubleshooting examples
|
||||
|
||||
Load this reference when users need specific command examples or encounter conversion issues.
|
||||
|
||||
### scripts/convert_path.py
|
||||
Python script to automate Windows to WSL path conversion. Handles:
|
||||
- Drive letter conversion (C:\ → /mnt/c/)
|
||||
- Backslash to forward slash
|
||||
- Special characters and spaces
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Convert Windows paths to WSL format** before bash operations
|
||||
2. **Verify paths exist** before operations using ls or test commands
|
||||
3. **Check output quality** after conversion
|
||||
4. **Use markitdown directly** for simple conversions
|
||||
5. **Test incrementally** - Verify each conversion step before proceeding
|
||||
6. **Preserve directory structure** when doing batch conversions
|
||||
- `scripts/extract_pdf_images.py` - Extract images from PDF using PyMuPDF
|
||||
- `scripts/convert_path.py` - Windows to WSL path converter
|
||||
- `references/conversion-examples.md` - Detailed examples for batch operations
|
||||
|
||||
Reference in New Issue
Block a user