feat(transcript-fixer): native AI correction as default mode (v1.3.0)

Claude IS the AI — when running inside Claude Code, use Claude's own language understanding for Stage 2 corrections instead of calling an external API. No API key needed by default. New capabilities in native mode: - Intelligent paragraph breaks at logical topic transitions - Filler word reduction (excessive repetition removal) - Interactive review with confidence-level tables - Context-aware judgment using full document context API mode (GLM) remains available for batch/automation use cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:40:38 +08:00
parent a496c91cae
commit 639a6d303e
2 changed files with 93 additions and 40 deletions
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -6,7 +6,7 @@
  },
  "metadata": {
    "description": "Professional Claude Code skills for GitHub operations, document conversion, diagram generation, statusline customization, Teams communication, repomix utilities, skill creation, CLI demo generation, LLM icon access, Cloudflare troubleshooting, UI design system extraction, professional presentation creation, YouTube video downloading, secure repomix packaging, ASR transcription correction, video comparison quality analysis, comprehensive QA testing infrastructure, prompt optimization with EARS methodology, session history recovery, local Claude session continuation from `.claude` artifacts, documentation cleanup, format-controlled deep research report generation with evidence tracking, PDF generation with Chinese font support, CLAUDE.md progressive disclosure optimization, CCPM skill registry search and management, Promptfoo LLM evaluation framework, iOS app development with XcodeGen and SwiftUI, fact-checking with automated corrections, Twitter/X content fetching, intelligent macOS disk space recovery, skill quality review and improvement, GitHub contribution strategy, complete internationalization/localization setup, plugin/skill troubleshooting with diagnostic tools, evidence-based competitor analysis with source citations, Windows Remote Desktop (AVD/W365) connection quality diagnosis with transport protocol analysis and log parsing, Tailscale+proxy conflict diagnosis with SSH tunnel SOP for remote development, multi-path parallel product analysis with cross-model test-time compute scaling, real financial data collection for US equities with validation and yfinance pitfall handling, advanced Excel automation for formatted workbook generation and complex xlsm parsing, macOS programmatic window screenshot capture workflows, and verified Scrapling CLI installation and web extraction workflows",
-    "version": "1.39.0",
+    "version": "1.40.0",
    "homepage": "https://github.com/daymade/claude-code-skills"
  },
  "plugins": [
@@ -15,7 +15,7 @@
      "description": "Essential meta-skill for creating effective Claude Code skills with initialization scripts, validation, packaging, marketplace registration, and privacy best practices",
      "source": "./",
      "strict": false,
-      "version": "1.5.1",
+      "version": "1.6.0",
      "category": "developer-tools",
      "keywords": [
        "skill-creation",
@@ -289,10 +289,10 @@
    },
    {
      "name": "transcript-fixer",
-      "description": "Corrects speech-to-text (ASR/STT) transcription errors in meeting notes, lecture recordings, interviews, and voice memos through dictionary-based rules and AI corrections. Supports Chinese domain names, AI fallback to Claude Code, and iterative dictionary building. Use when users mention transcript correction, ASR errors, speech-to-text mistakes, homophone errors, or working with transcription files",
+      "description": "Corrects speech-to-text (ASR/STT) transcription errors using dictionary rules and native Claude AI corrections (no API key needed by default). Supports intelligent paragraph breaks, filler word reduction, interactive review, Chinese domain names, and iterative dictionary building. Use when users mention transcript correction, ASR errors, speech-to-text mistakes, homophone errors, or working with transcription files",
      "source": "./",
      "strict": false,
-      "version": "1.2.0",
+      "version": "1.3.0",
      "category": "productivity",
      "keywords": [
        "transcription",
@@ -919,6 +919,27 @@
      "skills": [
        "./scrapling-skill"
      ]
+    },
+    {
+      "name": "asr-transcribe-to-text",
+      "description": "Transcribe audio and video files to text using a remote ASR service (Qwen3-ASR or OpenAI-compatible endpoint). Extracts audio from video, sends to configurable ASR endpoint, outputs clean text. Use when the user wants to transcribe recordings, convert audio/video to text, do speech-to-text, or mentions ASR, Qwen ASR, 转录, 语音转文字, 录音转文字, or has a meeting recording, lecture, interview, or screen recording to transcribe.",
+      "source": "./",
+      "strict": false,
+      "version": "1.0.0",
+      "category": "productivity",
+      "keywords": [
+        "asr",
+        "transcription",
+        "speech-to-text",
+        "qwen",
+        "audio",
+        "video",
+        "vllm",
+        "ffmpeg"
+      ],
+      "skills": [
+        "./asr-transcribe-to-text"
+      ]
    }
  ]
 }
--- a/transcript-fixer/SKILL.md
+++ b/transcript-fixer/SKILL.md
@@ -29,20 +29,36 @@ powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | ie

 ## Quick Start

-**Recommended: Use Enhanced Wrapper** (auto-detects API key, opens HTML diff):
+**Default: Native AI Correction (no API key needed)**
+
+When invoked from Claude Code, the skill uses a two-phase approach:
+1. **Dictionary phase** (script): Apply 700+ learned correction rules instantly
+2. **AI phase** (Claude native): Claude reads the text directly and fixes ASR errors, adds paragraph breaks, removes filler words

 ```bash
 # First time: Initialize database
 uv run scripts/fix_transcription.py --init

-# Process transcript with enhanced UX
-uv run scripts/fix_transcript_enhanced.py input.md --output ./corrected
+# Phase 1: Dictionary corrections (instant, free)
+uv run scripts/fix_transcription.py --input meeting.md --stage 1
 ```

-The enhanced wrapper automatically:
- Detects GLM API key from shell configs (checks lines near `ANTHROPIC_BASE_URL`)
- Moves output files to specified directory
- Opens HTML visual diff in browser for immediate feedback
+After Stage 1, Claude should:
+1. Read the Stage 1 output in ~3000-char chunks
+2. Identify ASR errors (homophones, technical terms, broken sentences)
+3. Present corrections in a table for user review (high/medium confidence)
+4. Apply confirmed corrections and save stable patterns to dictionary
+5. Optionally: add paragraph breaks and remove excessive filler words
+
+**Alternative: API-Based Batch Processing** (for automation or large volumes):
+
+```bash
+# Set API key for automated AI corrections
+export GLM_API_KEY="<api-key>"  # From https://open.bigmodel.cn/
+
+# Run full pipeline (dict + API AI + diff report)
+uv run scripts/fix_transcript_enhanced.py input.md --output ./corrected
+```

 **Timestamp repair**:
 ```bash
@@ -58,25 +74,9 @@ uv run scripts/split_transcript_sections.py meeting.txt \
  --rebase-to-zero
 ```

-**Alternative: Use Core Script Directly**:
-
-```bash
-# 1. Set API key (if not auto-detected)
-export GLM_API_KEY="<api-key>"  # From https://open.bigmodel.cn/
-
-# 2. Add common corrections (5-10 terms)
-uv run scripts/fix_transcription.py --add "错误词" "正确词" --domain general
-
-# 3. Run full correction pipeline
-uv run scripts/fix_transcription.py --input meeting.md --stage 3
-
-# 4. Review learned patterns after 3-5 runs
-uv run scripts/fix_transcription.py --review-learned
-```
-
 **Output files**:
 - `*_stage1.md` - Dictionary corrections applied
- `*_stage2.md` - AI corrections applied (final version)
+- `*_corrected.txt` - Final version (native mode) or `*_stage2.md` (API mode)
 - `*_对比.html` - Visual diff (open in browser for best experience)

 **Generate word-level diff** (recommended for reviewing corrections):
@@ -116,14 +116,15 @@ This creates an HTML file showing word-by-word differences with clear highlighti

 ## Core Workflow

-Three-stage pipeline stores corrections in `~/.transcript-fixer/corrections.db`:
+Two-phase pipeline stores corrections in `~/.transcript-fixer/corrections.db`:

 1. **Initialize** (first time): `uv run scripts/fix_transcription.py --init`
 2. **Add domain corrections**: `--add "错误词" "正确词" --domain <domain>`
-3. **Process transcript**: `--input file.md --stage 3`
-4. **Review learned patterns**: `--review-learned` and `--approve` high-confidence suggestions
+3. **Phase 1 — Dictionary**: `--input file.md --stage 1` (instant, free)
+4. **Phase 2 — AI Correction**: Claude reads output and fixes ASR errors natively (default), or use `--stage 3` with `GLM_API_KEY` for API mode
+5. **Save stable patterns**: `--add "错误词" "正确词"` after each fix session
+6. **Review learned patterns**: `--review-learned` and `--approve` high-confidence suggestions

-**Stages**: Dictionary (instant, free) → AI via GLM API (parallel) → Full pipeline
 **Domains**: `general`, `embodied_ai`, `finance`, `medical`, or custom names including Chinese (e.g., `火星加速器`, `具身智能`)
 **Learning**: Patterns appearing ≥3 times at ≥80% confidence move from AI to dictionary

@@ -182,14 +183,45 @@ If you understand the risks and still want to add a flagged rule:
 uv run scripts/fix_transcription.py --add "仿佛" "反复" --domain general --force
 ```

-## AI Fallback Strategy
+## Native AI Correction (Default Mode)

-When GLM API is unavailable (503, network issues), the script outputs `[CLAUDE_FALLBACK]` marker.
+**Claude IS the AI.** When running inside Claude Code, use Claude's own language understanding for Stage 2 corrections instead of calling an external API. This is the default behavior — no API key needed.

-Claude Code should then:
-1. Analyze the text directly for ASR errors
-2. Fix using Edit tool
-3. **MUST save corrections to dictionary** with `--add`
+### Workflow
+
+1. **Run Stage 1** (dictionary): `uv run scripts/fix_transcription.py --input file.md --stage 1`
+2. **Read the text** in ~3000-character chunks (use `cut -c<start>-<end>` for single-line files)
+3. **Identify ASR errors** — look for:
+   - Homophone errors (同音字): "上海文" → "上下文", "扩种" → "扩充"
+   - Broken sentence boundaries: "很大程。路上" → "很大程度上"
+   - Technical terms: "Web coding" → "Vibe Coding"
+   - Missing/extra characters: "沉沉默" → "沉默"
+4. **Present corrections** in a table with confidence levels before applying:
+   - High confidence: clear ASR errors with unambiguous corrections
+   - Medium confidence: context-dependent, need user confirmation
+5. **Apply corrections** to a copy of the file (never modify the original)
+6. **Save stable patterns** to dictionary: `--add "错误词" "正确词" --domain general`
+7. **Generate word diff**: `uv run scripts/generate_word_diff.py original.md corrected.md diff.html`
+
+### Enhanced AI Capabilities (Native Mode Only)
+
+Native mode can do things the API mode cannot:
+
+- **Intelligent paragraph breaks**: Add `\n\n` at logical topic transitions in continuous text
+- **Filler word reduction**: Remove excessive repetition (这个这个这个 → 这个, 都都都都 → 都)
+- **Interactive review**: Present corrections for user confirmation before applying
+- **Context-aware judgment**: Use full document context to resolve ambiguous errors
+
+### When to Use API Mode Instead
+
+Use `GLM_API_KEY` + Stage 3 for:
+- Batch processing multiple files in automation
+- When Claude Code is not available (standalone script usage)
+- Consistent reproducible processing without interactive review
+
+### Legacy Fallback Marker
+
+When the script outputs `[CLAUDE_FALLBACK]` (GLM API error), switch to native mode automatically.

 ## Database Operations

@@ -209,8 +241,8 @@ sqlite3 ~/.transcript-fixer/corrections.db "SELECT value FROM system_config WHER
 | Stage | Description | Speed | Cost |
 |-------|-------------|-------|------|
 | 1 | Dictionary only | Instant | Free |
-| 2 | AI only | ~10s | API calls |
-| 3 | Full pipeline | ~10s | API calls |
+| 1 + Native | Dictionary + Claude AI (default) | ~1min | Free |
+| 3 | Dictionary + API AI + diff report | ~10s | API calls |

 ## Bundled Resources