asr-transcribe-to-text: - Add local MLX transcription path (macOS Apple Silicon, 15-27x realtime) - Add bundled script transcribe_local_mlx.py with max_tokens=200000 - Add local_mlx_guide.md with benchmarks and truncation trap docs - Auto-detect platform and recommend local vs remote mode - Fix audio extraction format (MP3 → WAV 16kHz mono PCM) - Add Step 5: recommend transcript-fixer after transcription transcript-fixer: - Optimize SKILL.md from 289 → 153 lines (best practices compliance) - Move FALSE_POSITIVE_RISKS (40 lines) to references/false_positive_guide.md - Move Example Session to references/example_session.md - Improve description for better triggering (226 → 580 chars) - Add handoff to meeting-minutes-taker skill-creator: - Add "Pipeline Handoff" pattern to Skill Writing Guide - Add pipeline check reminder in Step 4 (Edit the Skill) Pipeline handoffs added to 8 skills forming 6 chains: - youtube-downloader → asr-transcribe-to-text → transcript-fixer → meeting-minutes-taker → pdf/ppt-creator - deep-research → fact-checker → pdf/ppt-creator - doc-to-markdown → docs-cleaner / fact-checker - claude-code-history-files-finder → continue-claude-work Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2.2 KiB
2.2 KiB
Local MLX Transcription Guide
Platform Requirements
- macOS on Apple Silicon (M1/M2/M3/M4/M5+)
- Python 3.10+
uvpackage manager- ~3GB disk for model weights (first download)
Recommended Configuration
| Setting | Value | Why |
|---|---|---|
| Model | mlx-community/Qwen3-ASR-1.7B-8bit |
8-bit quantized, fast inference, good quality |
| max_tokens | 200000 |
Default 8192 silently truncates audio >40min |
| Audio format | WAV 16kHz mono PCM | Best compatibility with ASR models |
Performance Benchmarks (M5 Pro 48GB, April 2026)
| Audio Length | Inference Time | Speed | Chars | Tokens |
|---|---|---|---|---|
| 1 min | 3.7s | 16x realtime | 295 | ~180 |
| 5 min | 11.1s | 27x realtime | 1,633 | ~980 |
| 15 min | 50.5s | 17.8x realtime | 5,074 | ~3,045 |
| 123 min | 502s (8m22s) | 14.7x realtime | 40,347 | 24,337 |
| 96 min | 409s (6m48s) | 14.1x realtime | 30,018 | 18,214 |
Model load: ~4s (cached), ~130s (first download).
Critical: max_tokens Truncation
The model.generate() method in mlx-audio has max_tokens=8192 as default. This is a global budget shared across all audio chunks, not per-chunk. When exhausted, remaining chunks are silently skipped.
For 123 minutes of Chinese speech:
- Required: ~24,000 tokens
- Default budget: 8,192 tokens
- Result: only first ~40 minutes transcribed, rest silently dropped
Always pass max_tokens=200000 for any audio longer than 20 minutes.
Model Weight Compatibility
Two MLX packages exist for Qwen3-ASR. Their weight formats are incompatible:
| Package | Use with | Weight Format |
|---|---|---|
mlx-audio (Blaizzy) |
mlx-community/Qwen3-ASR-1.7B-8bit |
mlx-audio quantization (audio_tower quantized) |
mlx-qwen3-asr (moona3k) |
Qwen/Qwen3-ASR-1.7B |
Own loader (audio_tower NOT quantized) |
Crossing these produces "Missing 297 parameters" error. This skill uses mlx-audio.
Alternatives Not Recommended
| Approach | Issue |
|---|---|
| PyTorch MPS (qwen-asr package) | 97.77% time in GPU↔CPU sync, RTF 5.5-24.5x |
| whisper.cpp large-v3-turbo | High Chinese error rate |
| Official qwen-asr on macOS | Designed for CUDA only |