feat: optimize skills + add pipeline handoff chaining across 9 skills

asr-transcribe-to-text: - Add local MLX transcription path (macOS Apple Silicon, 15-27x realtime) - Add bundled script transcribe_local_mlx.py with max_tokens=200000 - Add local_mlx_guide.md with benchmarks and truncation trap docs - Auto-detect platform and recommend local vs remote mode - Fix audio extraction format (MP3 → WAV 16kHz mono PCM) - Add Step 5: recommend transcript-fixer after transcription transcript-fixer: - Optimize SKILL.md from 289 → 153 lines (best practices compliance) - Move FALSE_POSITIVE_RISKS (40 lines) to references/false_positive_guide.md - Move Example Session to references/example_session.md - Improve description for better triggering (226 → 580 chars) - Add handoff to meeting-minutes-taker skill-creator: - Add "Pipeline Handoff" pattern to Skill Writing Guide - Add pipeline check reminder in Step 4 (Edit the Skill) Pipeline handoffs added to 8 skills forming 6 chains: - youtube-downloader → asr-transcribe-to-text → transcript-fixer → meeting-minutes-taker → pdf/ppt-creator - deep-research → fact-checker → pdf/ppt-creator - doc-to-markdown → docs-cleaner / fact-checker - claude-code-history-files-finder → continue-claude-work Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:27:23 +08:00
parent ccc10f3417
commit 5c9eda4fbd
13 changed files with 567 additions and 294 deletions
--- a/transcript-fixer/references/example_session.md
+++ b/transcript-fixer/references/example_session.md
@@ -0,0 +1,25 @@
+# Example Session
+
+## Input transcript (`meeting.md`)
+```
+今天我们讨论了巨升智能的最新进展。
+股价系统需要优化，目前性能不够好。
+```
+
+## After Stage 1 (`meeting_stage1.md`)
+```
+今天我们讨论了具身智能的最新进展。  ← "巨升"→"具身" corrected
+股价系统需要优化，目前性能不够好。  ← Unchanged (not in dictionary)
+```
+
+## After Stage 2 (`meeting_stage2.md`)
+```
+今天我们讨论了具身智能的最新进展。
+框架系统需要优化，目前性能不够好。  ← "股价"→"框架" corrected by AI
+```
+
+## Learned pattern detected
+```
+✓ Detected: "股价" → "框架" (confidence: 85%, count: 1)
+  Run --review-learned after 2 more occurrences to approve
+```
--- a/transcript-fixer/references/false_positive_guide.md
+++ b/transcript-fixer/references/false_positive_guide.md
@@ -0,0 +1,39 @@
+# False Positive Prevention Guide
+
+Dictionary-based corrections are powerful but dangerous. Adding the wrong rule silently corrupts every future transcript. The `--add` command runs safety checks automatically, but you must understand the risks.
+
+## What is safe to add
+
+- **ASR-specific gibberish**: "巨升智能" -> "具身智能" (no real word sounds like "巨升智能")
+- **Long compound errors**: "语音是别" -> "语音识别" (4+ chars, unlikely to collide)
+- **English transliteration errors**: "japanese 3 pro" -> "Gemini 3 Pro"
+
+## What is NEVER safe to add
+
+- **Common Chinese words**: "仿佛", "正面", "犹豫", "传说", "增加", "教育" -- these appear correctly in normal text. Replacing them corrupts transcripts from better ASR models.
+- **Words <=2 characters**: Almost any 2-char Chinese string is a valid word or part of one. "线数" inside "产线数据" becomes "产线束据".
+- **Both sides are real words**: "仿佛->反复", "犹豫->抑郁" -- both forms are valid Chinese. The "error" is only an error for one specific ASR model.
+
+## When in doubt, use a context rule instead
+
+Context rules use regex patterns that match only in specific surroundings, avoiding false positives:
+```bash
+# Instead of: --add "线数" "线束"
+# Use a context rule in the database:
+sqlite3 ~/.transcript-fixer/corrections.db "INSERT INTO context_rules (pattern, replacement, description, priority) VALUES ('(?<!产)线数(?!据)', '线束', 'ASR: 线数->线束 (not inside 产线数据)', 10);"
+```
+
+## Auditing the dictionary
+
+Run `--audit` periodically to scan all rules for false positive risks:
+```bash
+uv run scripts/fix_transcription.py --audit
+uv run scripts/fix_transcription.py --audit --domain manufacturing
+```
+
+## Forcing a risky addition
+
+If you understand the risks and still want to add a flagged rule:
+```bash
+uv run scripts/fix_transcription.py --add "仿佛" "反复" --domain general --force
+```