feat(doc-to-markdown): CJK bold spacing, JSON pretty-print, 31 tests, full rename cleanup
- Add CJK bold spacing fix: insert spaces around **bold** spans containing CJK characters for correct rendering (handles emoji adjacency, already-spaced) - Add JSON pretty-print: auto-format JSON code blocks with 2-space indent - Add 31 unit tests covering all post-processing functions - Fix pandoc simple table detection (1-space column gaps) - Fix image path double-nesting when --assets-dir ends with 'media' - Rename all markdown-tools references across 15 files (README, QUICKSTART, marketplace.json, CLAUDE.md, meeting-minutes-taker, GitHub templates) - Add 5-tool benchmark report (Docling/MarkItDown/Pandoc/Mammoth/ours) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -11,7 +11,7 @@ Transform raw meeting transcripts into comprehensive, evidence-based meeting min
|
||||
## Quick Start
|
||||
|
||||
**Pre-processing (Optional but Recommended):**
|
||||
- **Document conversion**: Use `markdown-tools` skill to convert .docx/.pdf to Markdown first (preserves tables/images)
|
||||
- **Document conversion**: Use `doc-to-markdown` skill to convert .docx/.pdf to Markdown first (preserves tables/images)
|
||||
- **Transcript cleanup**: Use `transcript-fixer` skill to fix ASR/STT errors if transcript quality is poor
|
||||
- **Context file**: Prepare `context.md` with team directory for accurate speaker identification
|
||||
|
||||
@@ -457,7 +457,7 @@ If v3 has a flowchart for "Status Query Mechanism" but v1/v2 don't have it, that
|
||||
**Full pipeline for .docx transcripts:**
|
||||
|
||||
```
|
||||
Step 0: markdown-tools # Convert .docx → Markdown (preserves tables/images)
|
||||
Step 0: doc-to-markdown # Convert .docx → Markdown (preserves tables/images)
|
||||
↓
|
||||
Step 0.5: transcript-fixer # Fix ASR errors (optional, if quality is poor)
|
||||
↓
|
||||
|
||||
Reference in New Issue
Block a user