Files
claude-code-skills-reference/transcript-fixer/references/false_positive_guide.md
daymade 5c9eda4fbd feat: optimize skills + add pipeline handoff chaining across 9 skills
asr-transcribe-to-text:
- Add local MLX transcription path (macOS Apple Silicon, 15-27x realtime)
- Add bundled script transcribe_local_mlx.py with max_tokens=200000
- Add local_mlx_guide.md with benchmarks and truncation trap docs
- Auto-detect platform and recommend local vs remote mode
- Fix audio extraction format (MP3 → WAV 16kHz mono PCM)
- Add Step 5: recommend transcript-fixer after transcription

transcript-fixer:
- Optimize SKILL.md from 289 → 153 lines (best practices compliance)
- Move FALSE_POSITIVE_RISKS (40 lines) to references/false_positive_guide.md
- Move Example Session to references/example_session.md
- Improve description for better triggering (226 → 580 chars)
- Add handoff to meeting-minutes-taker

skill-creator:
- Add "Pipeline Handoff" pattern to Skill Writing Guide
- Add pipeline check reminder in Step 4 (Edit the Skill)

Pipeline handoffs added to 8 skills forming 6 chains:
- youtube-downloader → asr-transcribe-to-text → transcript-fixer → meeting-minutes-taker → pdf/ppt-creator
- deep-research → fact-checker → pdf/ppt-creator
- doc-to-markdown → docs-cleaner / fact-checker
- claude-code-history-files-finder → continue-claude-work

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:27:23 +08:00

1.9 KiB

False Positive Prevention Guide

Dictionary-based corrections are powerful but dangerous. Adding the wrong rule silently corrupts every future transcript. The --add command runs safety checks automatically, but you must understand the risks.

What is safe to add

  • ASR-specific gibberish: "巨升智能" -> "具身智能" (no real word sounds like "巨升智能")
  • Long compound errors: "语音是别" -> "语音识别" (4+ chars, unlikely to collide)
  • English transliteration errors: "japanese 3 pro" -> "Gemini 3 Pro"

What is NEVER safe to add

  • Common Chinese words: "仿佛", "正面", "犹豫", "传说", "增加", "教育" -- these appear correctly in normal text. Replacing them corrupts transcripts from better ASR models.
  • Words <=2 characters: Almost any 2-char Chinese string is a valid word or part of one. "线数" inside "产线数据" becomes "产线束据".
  • Both sides are real words: "仿佛->反复", "犹豫->抑郁" -- both forms are valid Chinese. The "error" is only an error for one specific ASR model.

When in doubt, use a context rule instead

Context rules use regex patterns that match only in specific surroundings, avoiding false positives:

# Instead of: --add "线数" "线束"
# Use a context rule in the database:
sqlite3 ~/.transcript-fixer/corrections.db "INSERT INTO context_rules (pattern, replacement, description, priority) VALUES ('(?<!产)线数(?!据)', '线束', 'ASR: 线数->线束 (not inside 产线数据)', 10);"

Auditing the dictionary

Run --audit periodically to scan all rules for false positive risks:

uv run scripts/fix_transcription.py --audit
uv run scripts/fix_transcription.py --audit --domain manufacturing

Forcing a risky addition

If you understand the risks and still want to add a flagged rule:

uv run scripts/fix_transcription.py --add "仿佛" "反复" --domain general --force