feat(doc-to-markdown): CJK bold spacing, JSON pretty-print, 31 tests, full rename cleanup

- Add CJK bold spacing fix: insert spaces around **bold** spans containing CJK characters for correct rendering (handles emoji adjacency, already-spaced) - Add JSON pretty-print: auto-format JSON code blocks with 2-space indent - Add 31 unit tests covering all post-processing functions - Fix pandoc simple table detection (1-space column gaps) - Fix image path double-nesting when --assets-dir ends with 'media' - Rename all markdown-tools references across 15 files (README, QUICKSTART, marketplace.json, CLAUDE.md, meeting-minutes-taker, GitHub templates) - Add 5-tool benchmark report (Docling/MarkItDown/Pandoc/Mammoth/ours) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 03:18:37 +08:00
parent a5f3a4bfbe
commit d9e1967689
16 changed files with 351 additions and 90 deletions
--- a/meeting-minutes-taker/SKILL.md
+++ b/meeting-minutes-taker/SKILL.md
@@ -11,7 +11,7 @@ Transform raw meeting transcripts into comprehensive, evidence-based meeting min
 ## Quick Start

 **Pre-processing (Optional but Recommended):**
- **Document conversion**: Use `markdown-tools` skill to convert .docx/.pdf to Markdown first (preserves tables/images)
+- **Document conversion**: Use `doc-to-markdown` skill to convert .docx/.pdf to Markdown first (preserves tables/images)
 - **Transcript cleanup**: Use `transcript-fixer` skill to fix ASR/STT errors if transcript quality is poor
 - **Context file**: Prepare `context.md` with team directory for accurate speaker identification

@@ -457,7 +457,7 @@ If v3 has a flowchart for "Status Query Mechanism" but v1/v2 don't have it, that
 **Full pipeline for .docx transcripts:**

 ```
-Step 0: markdown-tools      # Convert .docx → Markdown (preserves tables/images)
+Step 0: doc-to-markdown      # Convert .docx → Markdown (preserves tables/images)
        ↓
 Step 0.5: transcript-fixer  # Fix ASR errors (optional, if quality is poor)
        ↓