firefrost-gaming/claude-code-skills-reference

Files

daymade 5c9eda4fbd feat: optimize skills + add pipeline handoff chaining across 9 skills

asr-transcribe-to-text:
- Add local MLX transcription path (macOS Apple Silicon, 15-27x realtime)
- Add bundled script transcribe_local_mlx.py with max_tokens=200000
- Add local_mlx_guide.md with benchmarks and truncation trap docs
- Auto-detect platform and recommend local vs remote mode
- Fix audio extraction format (MP3 → WAV 16kHz mono PCM)
- Add Step 5: recommend transcript-fixer after transcription

transcript-fixer:
- Optimize SKILL.md from 289 → 153 lines (best practices compliance)
- Move FALSE_POSITIVE_RISKS (40 lines) to references/false_positive_guide.md
- Move Example Session to references/example_session.md
- Improve description for better triggering (226 → 580 chars)
- Add handoff to meeting-minutes-taker

skill-creator:
- Add "Pipeline Handoff" pattern to Skill Writing Guide
- Add pipeline check reminder in Step 4 (Edit the Skill)

Pipeline handoffs added to 8 skills forming 6 chains:
- youtube-downloader → asr-transcribe-to-text → transcript-fixer → meeting-minutes-taker → pdf/ppt-creator
- deep-research → fact-checker → pdf/ppt-creator
- doc-to-markdown → docs-cleaner / fact-checker
- claude-code-history-files-finder → continue-claude-work

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-05 14:27:23 +08:00

2.2 KiB

Raw Permalink Blame History

Local MLX Transcription Guide

Platform Requirements

macOS on Apple Silicon (M1/M2/M3/M4/M5+)
Python 3.10+
uv package manager
~3GB disk for model weights (first download)

Recommended Configuration

Setting	Value	Why
Model	`mlx-community/Qwen3-ASR-1.7B-8bit`	8-bit quantized, fast inference, good quality
max_tokens	`200000`	Default 8192 silently truncates audio >40min
Audio format	WAV 16kHz mono PCM	Best compatibility with ASR models

Performance Benchmarks (M5 Pro 48GB, April 2026)

Audio Length	Inference Time	Speed	Chars	Tokens
1 min	3.7s	16x realtime	295	~180
5 min	11.1s	27x realtime	1,633	~980
15 min	50.5s	17.8x realtime	5,074	~3,045
123 min	502s (8m22s)	14.7x realtime	40,347	24,337
96 min	409s (6m48s)	14.1x realtime	30,018	18,214

Model load: ~4s (cached), ~130s (first download).

Critical: max_tokens Truncation

The model.generate() method in mlx-audio has max_tokens=8192 as default. This is a global budget shared across all audio chunks, not per-chunk. When exhausted, remaining chunks are silently skipped.

For 123 minutes of Chinese speech:

Required: ~24,000 tokens
Default budget: 8,192 tokens
Result: only first ~40 minutes transcribed, rest silently dropped

Always pass max_tokens=200000 for any audio longer than 20 minutes.

Model Weight Compatibility

Two MLX packages exist for Qwen3-ASR. Their weight formats are incompatible:

Package	Use with	Weight Format
`mlx-audio` (Blaizzy)	`mlx-community/Qwen3-ASR-1.7B-8bit`	mlx-audio quantization (audio_tower quantized)
`mlx-qwen3-asr` (moona3k)	`Qwen/Qwen3-ASR-1.7B`	Own loader (audio_tower NOT quantized)

Crossing these produces "Missing 297 parameters" error. This skill uses mlx-audio.

Alternatives Not Recommended

Approach	Issue
PyTorch MPS (qwen-asr package)	97.77% time in GPU↔CPU sync, RTF 5.5-24.5x
whisper.cpp large-v3-turbo	High Chinese error rate
Official qwen-asr on macOS	Designed for CUDA only

2.2 KiB Raw Permalink Blame History