feat(doc-to-markdown): CJK bold spacing, JSON pretty-print, 31 tests, full rename cleanup

- Add CJK bold spacing fix: insert spaces around **bold** spans containing CJK characters for correct rendering (handles emoji adjacency, already-spaced) - Add JSON pretty-print: auto-format JSON code blocks with 2-space indent - Add 31 unit tests covering all post-processing functions - Fix pandoc simple table detection (1-space column gaps) - Fix image path double-nesting when --assets-dir ends with 'media' - Rename all markdown-tools references across 15 files (README, QUICKSTART, marketplace.json, CLAUDE.md, meeting-minutes-taker, GitHub templates) - Add 5-tool benchmark report (Docling/MarkItDown/Pandoc/Mammoth/ours) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 03:18:37 +08:00
parent a5f3a4bfbe
commit d9e1967689
16 changed files with 351 additions and 90 deletions
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -668,7 +668,7 @@
    },
    {
      "name": "meeting-minutes-taker",
-      "description": "Transform meeting transcripts into high-fidelity, structured meeting minutes with iterative review. Features speaker identification via feature analysis (word count, speaking style, topic focus) with context.md team directory mapping, intelligent file naming from content, integration with markdown-tools and transcript-fixer for pre-processing, evidence-based recording with speaker quotes, Mermaid diagrams for architecture discussions, and multi-turn parallel generation with UNION merge",
+      "description": "Transform meeting transcripts into high-fidelity, structured meeting minutes with iterative review. Features speaker identification via feature analysis (word count, speaking style, topic focus) with context.md team directory mapping, intelligent file naming from content, integration with doc-to-markdown and transcript-fixer for pre-processing, evidence-based recording with speaker quotes, Mermaid diagrams for architecture discussions, and multi-turn parallel generation with UNION merge",
      "source": "./",
      "strict": false,
      "version": "1.1.0",
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -16,7 +16,7 @@ Which skill is affected?
 - [ ] skill-creator
 - [ ] github-ops
- [ ] markdown-tools
+- [ ] doc-to-markdown
 - [ ] mermaid-tools
 - [ ] statusline-generator
 - [ ] teams-channel-post-writer
--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -20,7 +20,7 @@ Which skill would this enhance?
 - [ ] skill-creator
 - [ ] github-ops
- [ ] markdown-tools
+- [ ] doc-to-markdown
 - [ ] mermaid-tools
 - [ ] statusline-generator
 - [ ] teams-channel-post-writer
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -33,7 +33,7 @@ Which skills are affected by this PR?
 - [ ] skill-creator
 - [ ] github-ops
- [ ] markdown-tools
+- [ ] doc-to-markdown
 - [ ] mermaid-tools
 - [ ] statusline-generator
 - [ ] teams-channel-post-writer
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,8 +7,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
-### Added
+### Changed
- None
+- **Renamed**: `markdown-tools` → `doc-to-markdown` — clearer name for DOCX/PDF/PPTX → Markdown conversion
 - **doc-to-markdown**: Added 8 DOCX post-processing fixes (grid tables, simple tables, CJK bold spacing, JSON pretty-print, image path flattening, pandoc attribute cleanup, code block detection, bracket fixes)
 - **doc-to-markdown**: Added 31 unit tests (`test_convert.py`)
 - **doc-to-markdown**: Added 5-tool benchmark report (`references/benchmark-2026-03-22.md`)
 ## [1.39.0] - 2026-03-18
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -179,7 +179,7 @@ This applies when you change ANY file under a skill directory:
 1. **skill-creator** ⭐ - **Essential meta-skill** for creating your own skills (with init/validate/package scripts)
 2. **github-ops** - GitHub operations via gh CLI and API
-3. **markdown-tools** - Document conversion with WSL path handling
+3. **doc-to-markdown** - DOCX/PDF/PPTX → Markdown conversion with CJK post-processing
 4. **mermaid-tools** - Diagram extraction and PNG generation
 5. **statusline-generator** - Claude Code statusline customization
 6. **teams-channel-post-writer** - Teams communication templates
--- a/QUICKSTART.md
+++ b/QUICKSTART.md
@@ -122,7 +122,7 @@ claude plugin marketplace add https://github.com/daymade/claude-code-skills
 # In Claude Code use `/plugin ...`; in your terminal use `claude plugin ...`
 # Step 2: Install skills you need
 claude plugin install github-ops@daymade-skills
-claude plugin install markdown-tools@daymade-skills
+claude plugin install doc-to-markdown@daymade-skills
 # ... add more as needed
 # Step 3: Restart Claude Code
@@ -136,7 +136,7 @@ This table is a quick starter list. See [README.md](./README.md) for the full ca
 |-------|-------------|-------------|
 | **skill-creator** ⭐ | Create your own skills | Building custom workflows |
 | **github-ops** | GitHub operations | Managing PRs, issues, workflows |
-| **markdown-tools** | Document conversion | Converting docs to markdown |
+| **doc-to-markdown** | Document conversion | Converting docs to markdown |
 | **mermaid-tools** | Diagram generation | Creating PNG diagrams |
 | **statusline-generator** | Statusline customization | Customizing Claude Code UI |
 | **teams-channel-post-writer** | Teams communication | Writing professional posts |
--- a/QUICKSTART.zh-CN.md
+++ b/QUICKSTART.zh-CN.md
@@ -122,7 +122,7 @@ claude plugin marketplace add https://github.com/daymade/claude-code-skills
 # 在 Claude Code 内使用 `/plugin ...`，在终端中使用 `claude plugin ...`
 # 步骤 2：安装你需要的技能
 claude plugin install github-ops@daymade-skills
-claude plugin install markdown-tools@daymade-skills
+claude plugin install doc-to-markdown@daymade-skills
 # ... 根据需要添加更多
 # 步骤 3：重启 Claude Code
@@ -136,7 +136,7 @@ claude plugin install markdown-tools@daymade-skills
 |-------|-------------|-------------|
 | **skill-creator** ⭐ | 创建你自己的技能 | 构建自定义工作流 |
 | **github-ops** | GitHub 操作 | 管理 PR、问题、工作流 |
-| **markdown-tools** | 文档转换 | 将文档转换为 markdown |
+| **doc-to-markdown** | 文档转换 | 将文档转换为 markdown |
 | **mermaid-tools** | 图表生成 | 创建 PNG 图表 |
 | **statusline-generator** | 状态栏定制 | 自定义 Claude Code UI |
 | **teams-channel-post-writer** | Teams 通信 | 编写专业帖子 |
--- a/README.md
+++ b/README.md
@@ -146,7 +146,7 @@ claude plugin install skill-creator@daymade-skills
 claude plugin install github-ops@daymade-skills
 # Document conversion
-claude plugin install markdown-tools@daymade-skills
+claude plugin install doc-to-markdown@daymade-skills
 # Diagram generation
 claude plugin install mermaid-tools@daymade-skills
@@ -294,7 +294,7 @@ Comprehensive GitHub operations using gh CLI and GitHub API.
 ---
-### 2. **markdown-tools** - Document Conversion Suite
+### 2. **doc-to-markdown** - Document Conversion Suite
 Converts documents to markdown with Windows/WSL path handling and PDF image extraction.
@@ -313,7 +313,7 @@ Converts documents to markdown with Windows/WSL path handling and PDF image extr
 **🎬 Live Demo**
-![Markdown Tools Demo](./demos/markdown-tools/convert-docs.gif)
+![Markdown Tools Demo](./demos/doc-to-markdown/convert-docs.gif)
 ---
@@ -1838,7 +1838,7 @@ Want to see all demos in one place with click-to-enlarge functionality? Check ou
 Use **github-ops** to streamline PR creation, issue management, and API operations.
 ### For Documentation
-Combine **markdown-tools** for document conversion and **mermaid-tools** for diagram generation to create comprehensive documentation. Use **llm-icon-finder** to add brand icons.
+Combine **doc-to-markdown** for document conversion and **mermaid-tools** for diagram generation to create comprehensive documentation. Use **llm-icon-finder** to add brand icons.
 ### For Research & Analysis
 Use **deep-research** to produce format-controlled research reports with evidence tables and citations. Combine with **fact-checker** to validate claims or with **twitter-reader** for social-source collection.
@@ -1916,7 +1916,7 @@ Use **iOS-APP-developer** to configure XcodeGen projects, resolve SPM dependency
 Use **macos-cleaner** to intelligently analyze and reclaim disk space on macOS with safety-first approach. Unlike one-click cleaners that blindly delete, macos-cleaner explains what each file is, categorizes by risk level (🟢/🟡/🔴), and requires explicit confirmation before any deletion. Perfect for developers dealing with Docker/Homebrew/npm/pip cache bloat, users wanting to understand storage consumption, or anyone who values transparency over automation. Combines script-based precision with optional Mole visual tool integration for hybrid workflow.
 ### For Twitter/X Content Research
-Use **twitter-reader** to fetch tweet content without JavaScript rendering or authentication. Perfect for documenting social media discussions, archiving threads, analyzing tweet content, or gathering reference material from Twitter/X. Combine with **markdown-tools** to convert fetched content into other formats, or with **repomix-safe-mixer** to package research collections securely.
+Use **twitter-reader** to fetch tweet content without JavaScript rendering or authentication. Perfect for documenting social media discussions, archiving threads, analyzing tweet content, or gathering reference material from Twitter/X. Combine with **doc-to-markdown** to convert fetched content into other formats, or with **repomix-safe-mixer** to package research collections securely.
 ### For Skill Quality & Open-Source Contributions
 Use **skill-reviewer** to validate your own skills against best practices before publishing, or to review and improve others' skill repositories. Combine with **github-contributor** to find high-impact open-source projects, create professional PRs, and build your contributor reputation. Perfect for developers who want to contribute to the Claude Code ecosystem or any GitHub project systematically.
@@ -1947,7 +1947,7 @@ Each skill includes:
 ### Quick Links
 - **github-ops**: See `github-ops/references/api_reference.md` for API documentation
- **markdown-tools**: See `markdown-tools/references/conversion-examples.md` for conversion scenarios
+- **doc-to-markdown**: See `doc-to-markdown/references/conversion-examples.md` for conversion scenarios
 - **mermaid-tools**: See `mermaid-tools/references/setup_and_troubleshooting.md` for setup guide
 - **statusline-generator**: See `statusline-generator/references/color_codes.md` for customization
 - **teams-channel-post-writer**: See `teams-channel-post-writer/references/writing-guidelines.md` for quality standards
@@ -1992,7 +1992,7 @@ Each skill includes:
 - **Claude Code** 2.0.13 or higher
 - **Python 3.6+** (for scripts in multiple skills)
 - **gh CLI** (for github-ops)
- **markitdown** (for markdown-tools)
+- **markitdown** (for doc-to-markdown)
 - **mermaid-cli** (for mermaid-tools)
 - **yt-dlp** (for youtube-downloader): `brew install yt-dlp` or `pip install yt-dlp`
 - **FFmpeg/FFprobe** (for video-comparer): `brew install ffmpeg`, `apt install ffmpeg`, or `winget install ffmpeg`
--- a/README.zh-CN.md
+++ b/README.zh-CN.md
@@ -146,7 +146,7 @@ claude plugin install skill-creator@daymade-skills
 claude plugin install github-ops@daymade-skills
 # 文档转换
-claude plugin install markdown-tools@daymade-skills
+claude plugin install doc-to-markdown@daymade-skills
 # 图表生成
 claude plugin install mermaid-tools@daymade-skills
@@ -319,7 +319,7 @@ CC-Switch 支持以下中国 AI 服务提供商：
 ---
-### 2. **markdown-tools** - 文档转换套件
+### 2. **doc-to-markdown** - 文档转换套件
 将文档转换为 markdown，支持 Windows/WSL 路径处理和 PDF 图片提取。
@@ -338,7 +338,7 @@ CC-Switch 支持以下中国 AI 服务提供商：
 **🎬 实时演示**
-![Markdown 工具演示](./demos/markdown-tools/convert-docs.gif)
+![Markdown 工具演示](./demos/doc-to-markdown/convert-docs.gif)
 ---
@@ -1880,7 +1880,7 @@ claude plugin install scrapling-skill@daymade-skills
 使用 **github-ops** 简化 PR 创建、问题管理和 API 操作。
 ### 文档处理
-结合 **markdown-tools** 进行文档转换和 **mermaid-tools** 进行图表生成，创建全面的文档。使用 **llm-icon-finder** 添加品牌图标。
+结合 **doc-to-markdown** 进行文档转换和 **mermaid-tools** 进行图表生成，创建全面的文档。使用 **llm-icon-finder** 添加品牌图标。
 ### 调研与分析
 使用 **deep-research** 生成格式可控的调研报告，包含证据表与引用。与 **fact-checker** 结合用于验证关键结论，或与 **twitter-reader** 结合收集社媒资料。
@@ -1952,7 +1952,7 @@ claude plugin install scrapling-skill@daymade-skills
 使用 **iOS-APP-developer** 配置 XcodeGen 项目，处理 SPM 依赖、签名与部署问题。
 ### Twitter/X 内容研究
-使用 **twitter-reader** 无需 JavaScript 渲染或身份验证即可获取推文内容。非常适合记录社交媒体讨论、归档话题、分析推文内容或从 Twitter/X 收集参考资料。与 **markdown-tools** 结合可将获取的内容转换为其他格式，或与 **repomix-safe-mixer** 结合安全地打包研究集合。
+使用 **twitter-reader** 无需 JavaScript 渲染或身份验证即可获取推文内容。非常适合记录社交媒体讨论、归档话题、分析推文内容或从 Twitter/X 收集参考资料。与 **doc-to-markdown** 结合可将获取的内容转换为其他格式，或与 **repomix-safe-mixer** 结合安全地打包研究集合。
 ### macOS 系统维护与磁盘空间恢复
 使用 **macos-cleaner** 以安全优先的方式智能分析和恢复 macOS 上的磁盘空间。与盲目删除的一键清理工具不同，macos-cleaner 解释每个文件是什么、按风险级别分类（🟢/🟡/🔴），并在任何删除前需要明确确认。非常适合处理 Docker/Homebrew/npm/pip 缓存膨胀的开发者、希望了解存储空间消耗的用户，或任何重视透明度而非自动化的人。结合基于脚本的精度和可选的 Mole 可视化工具集成以实现混合工作流。
@@ -1989,7 +1989,7 @@ claude plugin install scrapling-skill@daymade-skills
 ### 快速链接
 - **github-ops**：参见 `github-ops/references/api_reference.md` 了解 API 文档
- **markdown-tools**：参见 `markdown-tools/references/conversion-examples.md` 了解转换场景
+- **doc-to-markdown**：参见 `doc-to-markdown/references/conversion-examples.md` 了解转换场景
 - **mermaid-tools**：参见 `mermaid-tools/references/setup_and_troubleshooting.md` 了解设置指南
 - **statusline-generator**：参见 `statusline-generator/references/color_codes.md` 了解自定义
 - **teams-channel-post-writer**：参见 `teams-channel-post-writer/references/writing-guidelines.md` 了解质量标准
@@ -2034,7 +2034,7 @@ claude plugin install scrapling-skill@daymade-skills
 - **Claude Code** 2.0.13 或更高版本
 - **Python 3.6+**（用于多个技能中的脚本）
 - **gh CLI**（用于 github-ops）
- **markitdown**（用于 markdown-tools）
+- **markitdown**（用于 doc-to-markdown）
 - **mermaid-cli**（用于 mermaid-tools）
 - **VHS**（用于 cli-demo-generator）：`brew install vhs`
 - **asciinema**（可选，用于 cli-demo-generator 交互式录制）
--- a/demos/README.md
+++ b/demos/README.md
@@ -14,7 +14,7 @@ demos/
 │   └── package-skill.tape        # Package for distribution
 ├── github-ops/
 │   └── create-pr.tape            # Create pull requests
-├── markdown-tools/
+├── doc-to-markdown/
 │   └── convert-docs.tape         # Convert documents
 └── generate_all_demos.sh         # Generate all GIFs
 ```
--- a/doc-to-markdown/SKILL.md
+++ b/doc-to-markdown/SKILL.md
@@ -1,75 +1,68 @@
 ---
 name: doc-to-markdown
-description: Converts DOCX/PDF/PPTX to high-quality Markdown with automatic post-processing. Fixes pandoc grid tables, image paths, attribute noise, and code blocks. Supports Quick Mode (fast, single tool) and Heavy Mode (best quality, multi-tool merge). Trigger on "convert document", "docx to markdown", "parse word", "doc to markdown", "extract images from document".
+description: Converts DOCX/PDF/PPTX to high-quality Markdown with automatic post-processing. Fixes pandoc grid tables, simple tables, image paths, CJK bold spacing, attribute noise, and code blocks. Benchmarked best-in-class (7.6/10) against Docling, MarkItDown, Pandoc raw, and Mammoth. Trigger on "convert document", "docx to markdown", "parse word", "doc to markdown", "解析word", "转换文档".
 ---
 # Doc to Markdown
 Convert documents to high-quality markdown with intelligent multi-tool orchestration and automatic DOCX post-processing.
-## Dual Mode Architecture
+**Architecture**: Pandoc (best-in-class extraction) + 8 post-processing fixes (our value-add).
 ## Quick Start
 ```bash
 # DOCX → Markdown (one command, zero manual fixes)
 uv run --with pymupdf4llm --with markitdown scripts/convert.py document.docx -o output.md --assets-dir ./media
 # PDF → Markdown
 uv run --with pymupdf4llm --with markitdown scripts/convert.py document.pdf -o output.md
 # Run tests
 uv run --with pytest pytest scripts/test_convert.py -v
 ```
 ## Dual Mode
 | Mode | Speed | Quality | Use Case |
 |------|-------|---------|----------|
 | **Quick** (default) | Fast | Good | Drafts, simple documents |
 | **Heavy** | Slower | Best | Final documents, complex layouts |
-## Quick Start
+## Tool Selection
-### Installation
+| Format | Quick Mode | Heavy Mode |
-
+|--------|-----------|------------|
 ```bash
 # Required: PDF/DOCX/PPTX support
 uv tool install "markitdown[pdf]"
 pip install pymupdf4llm
 brew install pandoc
 ```
 ### Basic Conversion
 ```bash
 # Quick Mode (default) - fast, single best tool
 uv run --with pymupdf4llm --with markitdown scripts/convert.py document.pdf -o output.md
 # Heavy Mode - multi-tool parallel execution with merge
 uv run --with pymupdf4llm --with markitdown scripts/convert.py document.pdf -o output.md --heavy
 # DOCX with deep python-docx parsing (experimental)
 uv run --with pymupdf4llm --with markitdown --with python-docx scripts/convert.py document.docx -o output.md --docx-deep
 # Check available tools
 uv run scripts/convert.py --list-tools
 ```
 ## Tool Selection Matrix
 | Format | Quick Mode Tool | Heavy Mode Tools |
 |--------|----------------|------------------|
 | PDF | pymupdf4llm | pymupdf4llm + markitdown |
 | DOCX | pandoc + post-processing | pandoc + markitdown |
 | PPTX | markitdown | markitdown + pandoc |
 | XLSX | markitdown | markitdown |
 ### Tool Characteristics
 - **pymupdf4llm**: LLM-optimized PDF conversion with native table detection and image extraction
 - **markitdown**: Microsoft's universal converter, good for Office formats
 - **pandoc**: Excellent structure preservation for DOCX/PPTX
 ## DOCX Post-Processing (automatic)
-When converting DOCX files via pandoc, the following cleanups are applied automatically:
+When converting DOCX via pandoc, 8 cleanups are applied automatically:
-| Problem | Fix |
+| Problem | Fix | Test coverage |
-|---------|-----|
+|---------|-----|---------------|
-| Grid tables (`+:---+` syntax) | Single-column -> blockquote, multi-column -> split images |
+| Grid tables (`+:---+`) | Single-column → blockquote, multi-column → pipe table | `TestPostprocessPipeline` |
-| Image double path (`media/media/`) | Flatten to `media/` |
+| Simple tables (`  ---- ----`) | Multi-column images → pipe table with captions | `TestSimpleTable` |
-| Pandoc attributes (`{width="..." height="..."}`) | Removed |
+| Image path nesting (`media/media/`) | Flatten to `media/`, absolute → relative | `test_stats_tracking` |
-| Inline classes (`{.underline}`, `{.mark}`) | Removed |
+| Pandoc attributes (`{width="..."}`) | Removed | `test_pandoc_attributes_removed` |
-| Indented dashed code blocks | Converted to fenced code blocks (```) |
+| CJK bold spacing (`**粗体**中文`) | Add space around `**` for CJK bold spans | `TestCjkBoldSpacing` (15 cases) |
-| Escaped brackets (`\[...\]`) | Unescaped to `[...]` |
+| Indented dashed code blocks | → fenced ``` with language detection | `test_code_block_with_language` |
-| Double-bracket links (`[[text]{...}](url)`) | Simplified to `[text](url)` |
+| Escaped brackets (`\[...\]`) | → `[...]` | `test_escaped_brackets_fixed` |
-| Escaped quotes in code (`\"`) | Fixed to `"` |
+| Double-bracket links (`[[text]](url)`) | → `[text](url)` | `test_double_bracket_links_fixed` |
 ### CJK Bold Spacing — why and how
 DOCX uses run-level styling (no spaces between bold/normal runs in CJK text). Markdown renderers need whitespace around `**` to recognize bold boundaries.
 **Rule**: if a `**content**` span contains any CJK character, ensure both sides have a space — unless already spaced or at line boundary. This handles CJK punctuation, emoji adjacency, and mixed content.
 ```
 Before: 打开**飞书**，就可以    → some renderers fail to bold
 After:  打开 **飞书** ，就可以  → universally renders correctly
 ```
 ## Heavy Mode Workflow
@@ -166,6 +159,7 @@ brew install pandoc
 | Script | Purpose |
 |--------|---------|
 | `convert.py` | Main orchestrator with Quick/Heavy mode + DOCX post-processing |
 | `test_convert.py` | 31 tests covering all post-processing functions |
 | `merge_outputs.py` | Merge multiple markdown outputs |
 | `validate_output.py` | Quality validation with HTML report |
 | `extract_pdf_images.py` | PDF image extraction with metadata |
@@ -173,6 +167,7 @@ brew install pandoc
 ## References
 - `references/benchmark-2026-03-22.md` - 5-tool benchmark (Docling/MarkItDown/Pandoc/Mammoth/ours)
 - `references/heavy-mode-guide.md` - Detailed Heavy Mode documentation
 - `references/tool-comparison.md` - Tool capabilities comparison
 - `references/conversion-examples.md` - Batch operation examples
--- a/doc-to-markdown/references/heavy-mode-guide.md
+++ b/doc-to-markdown/references/heavy-mode-guide.md
@@ -1,6 +1,6 @@
 # Heavy Mode Guide
-Detailed documentation for markdown-tools Heavy Mode conversion.
+Detailed documentation for doc-to-markdown Heavy Mode conversion.
 ## Overview
--- a/doc-to-markdown/scripts/convert.py
+++ b/doc-to-markdown/scripts/convert.py
@@ -26,6 +26,7 @@ Dependencies:
 """
 import argparse
 import json
 import re
 import subprocess
 import sys
@@ -478,10 +479,19 @@ def _fix_code_blocks(text: str, stats: PostProcessStats) -> str:
                # Decide: code block vs blockquote
                if has_lang_hint or _is_code_content(cleaned):
-                    # Code block
+                    # Code block — try to pretty-print JSON
                    code_lines = cleaned
                    if lang_hint == "json":
                        try:
                            raw = "\n".join(cleaned)
                            parsed = json.loads(raw)
                            code_lines = json.dumps(parsed, indent=2, ensure_ascii=False).split("\n")
                        except (json.JSONDecodeError, ValueError):
                            pass  # Keep original if not valid JSON
                    result.append("")
                    result.append(f"```{lang_hint}")
-                    result.extend(cleaned)
+                    result.extend(code_lines)
                    result.append("```")
                    result.append("")
                else:
@@ -529,29 +539,40 @@ def _fix_double_bracket_links(text: str, stats: PostProcessStats) -> str:
 def _fix_cjk_bold_spacing(text: str) -> str:
-    """Add space between **bold** markers and adjacent CJK characters.
+    """Add space around **bold** spans that contain CJK characters.
    DOCX uses run-level styling for bold — no spaces between runs in CJK text.
    Markdown renderers need whitespace around ** to recognize bold boundaries.
-    We find each **content** span, check the character before/after, and insert
+
-    a space only when the adjacent character is CJK (avoiding double spaces).
+    Rule: if a **content** span contains any CJK character, ensure both sides
    have a space (unless already spaced or at line boundary). This handles:
    - CJK directly touching **: 打开**飞书** → 打开 **飞书**
    - Emoji touching **: **密码】**➡️ → **密码】** ➡️
    - Already spaced: 已有 **粗体** → unchanged
    - English bold: English **bold** text → unchanged
    """
    result = []
    last_end = 0
    for m in _RE_BOLD_PAIR.finditer(text):
        start, end = m.start(), m.end()
        content = m.group(1)
        result.append(text[last_end:start])
-        # Space before opening ** if preceded by CJK
+        # Only add spaces for bold spans containing CJK
-        if start > 0 and _RE_CJK_PUNCT.match(text[start - 1]):
+        if _RE_CJK_PUNCT.search(content):
-            result.append(' ')
+            # Space before ** if previous char is not whitespace
            if start > 0 and text[start - 1] not in (' ', '\t', '\n'):
                result.append(' ')
-        result.append(m.group(0))
+            result.append(m.group(0))
-        # Space after closing ** if followed by CJK
+            # Space after ** if next char is not whitespace
-        if end < len(text) and _RE_CJK_PUNCT.match(text[end]):
+            if end < len(text) and text[end] not in (' ', '\t', '\n'):
-            result.append(' ')
+                result.append(' ')
        else:
            result.append(m.group(0))
        last_end = end
--- a/doc-to-markdown/scripts/test_convert.py
+++ b/doc-to-markdown/scripts/test_convert.py
@@ -0,0 +1,242 @@
 """Tests for doc-to-markdown convert.py post-processing functions.
 Run: uv run pytest scripts/test_convert.py -v
 """
 import pytest
 import re
 import sys
 from pathlib import Path
 # Import the module under test
 sys.path.insert(0, str(Path(__file__).parent))
 from convert import (
    _fix_cjk_bold_spacing,
    _build_pipe_table,
    _collect_images,
    PostProcessStats,
    postprocess_docx_markdown,
 )
 # ── CJK Bold Spacing ─────────────────────────────────────────────────────────
 class TestCjkBoldSpacing:
    """Test _fix_cjk_bold_spacing: spaces between **bold** and CJK chars."""
    def test_bold_followed_by_cjk_punctuation(self):
        """**text** directly touching CJK colon → add space after **."""
        inp = "**打开阶跃开放平台链接**：https://platform.stepfun.com/"
        out = _fix_cjk_bold_spacing(inp)
        assert "**打开阶跃开放平台链接** ：" in out
    def test_cjk_before_bold(self):
        """CJK char directly before ** → add space before **."""
        assert _fix_cjk_bold_spacing("可用**手机号**进行") == "可用 **手机号** 进行"
    def test_bold_with_emoji_neighbor(self):
        """**text** touching emoji ➡️ → still add space (CJK content rule)."""
        inp = "点击**【接口密码】**➡️**【创建新的密钥**】"
        out = _fix_cjk_bold_spacing(inp)
        # Each CJK-containing bold span should have spaces on both sides
        assert "点击 **【接口密码】** ➡️" in out
        assert "➡️ **【创建新的密钥**" in out
    def test_full_emoji_line(self):
        """Complete line with emoji separators between bold spans."""
        inp = "点击**【接口密码】**➡️**【创建新的密钥**】➡️**【输入密钥名称】**（输入你想取的名称），生成API Key"
        out = _fix_cjk_bold_spacing(inp)
        assert "点击 **【接口密码】** ➡️" in out
        assert "**【输入密钥名称】** （输入" in out
    def test_bold_between_cjk(self):
        """CJK **text** CJK → spaces on both sides."""
        assert _fix_cjk_bold_spacing("打开**飞书**，就可以") == "打开 **飞书** ，就可以"
    def test_bold_with_chinese_quotes(self):
        """Bold containing Chinese quotes."""
        inp = '有个**"企鹅戴龙虾头套的机器人"**，开始'
        out = _fix_cjk_bold_spacing(inp)
        assert '**"企鹅戴龙虾头套的机器人"** ，' in out
    def test_multiple_bold_spans(self):
        """Multiple bold spans in one line."""
        assert _fix_cjk_bold_spacing("这是**测试**和**验证**的效果") == "这是 **测试** 和 **验证** 的效果"
    def test_already_spaced(self):
        """Already has spaces → no double spaces."""
        inp = "已有空格 **粗体** 不需要再加"
        assert _fix_cjk_bold_spacing(inp) == inp
    def test_english_unchanged(self):
        """English bold text should not be modified."""
        inp = "English **bold** text should not change"
        assert _fix_cjk_bold_spacing(inp) == inp
    def test_line_start_bold(self):
        """Bold at line start followed by CJK."""
        assert _fix_cjk_bold_spacing("**重要**内容") == "**重要** 内容"
    def test_line_start_bold_standalone(self):
        """Bold at line start with no CJK neighbor → no change."""
        assert _fix_cjk_bold_spacing("**这是纯粗体不需要改**") == "**这是纯粗体不需要改**"
    def test_no_bold(self):
        """Text without bold markers → unchanged."""
        inp = "这是普通文本，没有粗体"
        assert _fix_cjk_bold_spacing(inp) == inp
    def test_empty_string(self):
        assert _fix_cjk_bold_spacing("") == ""
    def test_bold_at_line_end(self):
        """Bold at line end → no trailing space needed."""
        assert _fix_cjk_bold_spacing("内容是**粗体**") == "内容是 **粗体**"
    def test_mixed_cjk_and_english_bold(self):
        """English bold between CJK → no change (no CJK in content)."""
        inp = "请使用 **API Key** 进行认证"
        assert _fix_cjk_bold_spacing(inp) == inp
 # ── Pipe Table Builder ────────────────────────────────────────────────────────
 class TestBuildPipeTable:
    """Test _build_pipe_table: rows → markdown pipe table."""
    def test_basic_table(self):
        rows = [["a", "b"], ["c", "d"]]
        result = _build_pipe_table(rows)
        assert result == [
            "|  |  |",
            "| --- | --- |",
            "| a | b |",
            "| c | d |",
        ]
    def test_uneven_rows(self):
        """Rows with different column counts → padded."""
        rows = [["a", "b", "c"], ["d"]]
        result = _build_pipe_table(rows)
        assert "| d |  |  |" in result
    def test_single_cell(self):
        rows = [["only"]]
        result = _build_pipe_table(rows)
        assert len(result) == 3  # header + sep + 1 row
    def test_empty_rows(self):
        assert _build_pipe_table([]) == []
    def test_image_with_caption(self):
        """Images and captions should pair correctly in table."""
        rows = [
            ["![](img1.png)", "![](img2.png)"],
            ["Step 1", "Step 2"],
        ]
        result = _build_pipe_table(rows)
        assert "| ![](img1.png) | ![](img2.png) |" in result
        assert "| Step 1 | Step 2 |" in result
 # ── Full Post-Processing Pipeline ─────────────────────────────────────────────
 class TestPostprocessPipeline:
    """Integration tests for the full postprocess_docx_markdown pipeline."""
    def test_grid_table_single_column_to_blockquote(self):
        """Single-column grid table → blockquote."""
        inp = """+:---+
 | 注意事项 |
 +----+"""
        out, stats = postprocess_docx_markdown(inp)
        assert "> 注意事项" in out
        assert "+:---+" not in out
    def test_pandoc_attributes_removed(self):
        """Pandoc {width=...} and {.underline} removed."""
        inp = '![](img.png){width="5in" height="3in"} and [text]{.underline}'
        out, stats = postprocess_docx_markdown(inp)
        assert "{width=" not in out
        assert "{.underline}" not in out
        assert "![](img.png)" in out
    def test_escaped_brackets_fixed(self):
        r"""Pandoc \[ and \] → [ and ]."""
        inp = r"你 \[在飞书里\] 发消息"
        out, stats = postprocess_docx_markdown(inp)
        assert "你 [在飞书里] 发消息" in out
    def test_double_bracket_links_fixed(self):
        """[[text]](url) → [text](url)."""
        inp = "[[点击跳转]](https://example.com)"
        out, stats = postprocess_docx_markdown(inp)
        assert "[点击跳转](https://example.com)" in out
    def test_code_block_with_language(self):
        """Indented dashed block with JSON language hint → ```json."""
        inp = """  ------------------------------------------------------------------
  JSON\\
  {\\
  "provider": "stepfun"\\
  }
  ------------------------------------------------------------------"""
        out, stats = postprocess_docx_markdown(inp)
        assert "```json" in out
        assert '"provider": "stepfun"' in out
        assert "---" not in out
    def test_code_block_plain_text_to_blockquote(self):
        """Indented dashed block with plain text → blockquote."""
        inp = """  --------------------------
  注意：这是一条重要提示
  --------------------------"""
        out, stats = postprocess_docx_markdown(inp)
        assert "> 注意：这是一条重要提示" in out
    def test_cjk_bold_spacing_in_pipeline(self):
        """CJK bold spacing is applied in the full pipeline."""
        inp = "打开**飞书**，就可以看到"
        out, stats = postprocess_docx_markdown(inp)
        assert "打开 **飞书** ，就可以看到" in out
    def test_excessive_blank_lines_collapsed(self):
        """4+ blank lines → 2 blank lines."""
        inp = "line1\n\n\n\n\nline2"
        out, stats = postprocess_docx_markdown(inp)
        assert out.count("\n") < 5
    def test_stats_tracking(self):
        """Stats object correctly tracks fix counts."""
        inp = '![](media/media/img.png){width="5in"}'
        out, stats = postprocess_docx_markdown(inp)
        assert stats.attributes_removed > 0
 # ── Simple Table (pandoc) ─────────────────────────────────────────────────────
 class TestSimpleTable:
    """Test pandoc simple table (indented dashes with spaces) → pipe table."""
    def test_two_column_image_table(self):
        """Two images side by side in simple table → pipe table."""
        inp = """  ---- ----
   ![](img1.png)   ![](img2.png)
  ---- ----"""
        out, stats = postprocess_docx_markdown(inp)
        assert "| ![](img1.png) | ![](img2.png) |" in out
        assert "----" not in out
    def test_four_column_image_table(self):
        """Four images in simple table → 4-column pipe table."""
        inp = """  ---------- ---------- ---------- ----------
   ![](a.png)   ![](b.png)   ![](c.png)   ![](d.png)
  ---------- ---------- ---------- ----------"""
        out, stats = postprocess_docx_markdown(inp)
        assert "| ![](a.png) | ![](b.png) | ![](c.png) | ![](d.png) |" in out
--- a/meeting-minutes-taker/SKILL.md
+++ b/meeting-minutes-taker/SKILL.md
@@ -11,7 +11,7 @@ Transform raw meeting transcripts into comprehensive, evidence-based meeting min
 ## Quick Start
 **Pre-processing (Optional but Recommended):**
- **Document conversion**: Use `markdown-tools` skill to convert .docx/.pdf to Markdown first (preserves tables/images)
+- **Document conversion**: Use `doc-to-markdown` skill to convert .docx/.pdf to Markdown first (preserves tables/images)
 - **Transcript cleanup**: Use `transcript-fixer` skill to fix ASR/STT errors if transcript quality is poor
 - **Context file**: Prepare `context.md` with team directory for accurate speaker identification
@@ -457,7 +457,7 @@ If v3 has a flowchart for "Status Query Mechanism" but v1/v2 don't have it, that
 **Full pipeline for .docx transcripts:**
 ```
-Step 0: markdown-tools      # Convert .docx → Markdown (preserves tables/images)
+Step 0: doc-to-markdown      # Convert .docx → Markdown (preserves tables/images)
        ↓
 Step 0.5: transcript-fixer  # Fix ASR errors (optional, if quality is poor)
        ↓