New scripts: - fix_transcript_timestamps.py: Repair malformed timestamps (HH:MM:SS format) - split_transcript_sections.py: Split transcript by keywords and rebase timestamps - Automated tests for both scripts Features: - Timestamp validation and repair (handle missing colons, invalid ranges) - Section splitting with custom names - Rebase timestamps to 00:00:00 for each section - Preserve speaker format and content integrity - In-place editing with backup Documentation updates: - Add usage examples to SKILL.md - Clarify dictionary iteration workflow (save stable patterns only) - Update workflow guides with new script references - Add script parameter documentation Use cases: - Fix ASR output with broken timestamps - Split long meetings into focused sections - Prepare sections for independent processing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6.6 KiB
Script Parameters Reference
Detailed command-line parameters and usage examples for transcript-fixer Python scripts.
Table of Contents
- fix_transcription.py - Main correction pipeline
- fix_transcript_timestamps.py - Normalize/repair speaker timestamps
- split_transcript_sections.py - Split transcript into named sections
- diff_generator.py - Generate comparison reports
- Common Workflows
- Exit Codes
- Environment Variables
fix_transcription.py
Main correction pipeline script supporting three processing stages.
Syntax
python scripts/fix_transcription.py --input <file> --stage <1|2|3> [--output <dir>]
Parameters
--input, -i(required): Input Markdown file path--stage, -s(optional): Stage to execute (default: 3)1= Dictionary corrections only2= AI corrections only (requires Stage 1 output file)3= Both stages sequentially
--output, -o(optional): Output directory (defaults to input file directory)
Usage Examples
Run dictionary corrections only:
python scripts/fix_transcription.py --input meeting.md --stage 1
Output: meeting_阶段1_词典修复.md
Run AI corrections only:
python scripts/fix_transcription.py --input meeting_阶段1_词典修复.md --stage 2
Output: meeting_阶段2_AI修复.md
Note: Requires Stage 1 output file as input.
Run complete pipeline:
python scripts/fix_transcription.py --input meeting.md --stage 3
Outputs:
meeting_阶段1_词典修复.mdmeeting_阶段2_AI修复.md
Custom output directory:
python scripts/fix_transcription.py --input meeting.md --stage 3 --output ./corrections
Exit Codes
0- Success1- Missing required parameters or file not found2- GLM_API_KEY environment variable not set (Stage 2 or 3 only)3- API request failed
fix_transcript_timestamps.py
Normalize speaker timestamp lines such as 天生 00:21 or Speaker 7 01:31:10.
Syntax
python scripts/fix_transcript_timestamps.py <file> [--output FILE | --in-place | --check]
Key Parameters
--format {hhmmss,preserve}: output timestamp style--rebase-to-zero: reset the first detected speaker timestamp to00:00:00--rollover-backjump-seconds: threshold for treating59:58 -> 00:05as a new hour--jitter-seconds: tolerated small backward jitter before flagging anomaly
Usage Examples
# Normalize mixed MM:SS / HH:MM:SS
python scripts/fix_transcript_timestamps.py meeting.txt --in-place
# Rebase a split transcript so it starts at 00:00:00
python scripts/fix_transcript_timestamps.py workshop-class.txt --in-place --rebase-to-zero
# Only inspect anomalies, do not write
python scripts/fix_transcript_timestamps.py meeting.txt --check
split_transcript_sections.py
Split a transcript into named sections using marker phrases. Useful for workshop transcripts that include setup chat, class, and debrief in one file.
Syntax
python scripts/split_transcript_sections.py <file> \
--first-section-name <name> \
--section "Name::Marker" \
--section "Name::Marker"
Usage Example
python scripts/split_transcript_sections.py workshop.txt \
--first-section-name "课前聊天" \
--section "正式上课::好,无缝切换嘛。对。那个曹总连上了吗?那个网页。" \
--section "课后复盘::我们复盘一下。" \
--rebase-to-zero
generate_diff_report.py
Multi-format diff report generator for comparing correction stages.
Syntax
python scripts/generate_diff_report.py --original <file> --stage1 <file> --stage2 <file> [--output-dir <dir>]
Parameters
--original(required): Original transcript file path--stage1(required): Stage 1 correction output file path--stage2(required): Stage 2 correction output file path--output-dir(optional): Output directory for diff reports (defaults to original file directory)
Usage Examples
Basic usage:
python scripts/generate_diff_report.py \
--original "meeting.md" \
--stage1 "meeting_阶段1_词典修复.md" \
--stage2 "meeting_阶段2_AI修复.md"
Custom output directory:
python scripts/generate_diff_report.py \
--original "meeting.md" \
--stage1 "meeting_阶段1_词典修复.md" \
--stage2 "meeting_阶段2_AI修复.md" \
--output-dir "./reports"
Output Files
The script generates four comparison formats:
-
Markdown summary (
*_对比报告.md)- High-level statistics and change summary
- Word count changes per stage
- Common error patterns identified
-
Unified diff (
*_unified.diff)- Traditional Unix diff format
- Suitable for command-line review or version control
-
HTML side-by-side (
*_对比.html)- Visual side-by-side comparison
- Color-coded additions/deletions
- Recommended for human review
-
Inline marked (
*_行内对比.txt)- Single-column format with inline change markers
- Useful for quick text editor review
Exit Codes
0- Success1- Missing required parameters or file not found2- File format error (non-Markdown input)
Common Workflows
Testing Dictionary Changes
Test dictionary updates before running expensive AI corrections:
# 1. Update CORRECTIONS_DICT in scripts/fix_transcription.py
# 2. Run Stage 1 only
python scripts/fix_transcription.py --input meeting.md --stage 1
# 3. Review output
cat meeting_阶段1_词典修复.md
# 4. If satisfied, run Stage 2
python scripts/fix_transcription.py --input meeting_阶段1_词典修复.md --stage 2
Batch Processing
Process multiple transcripts in sequence:
for file in transcripts/*.md; do
python scripts/fix_transcription.py --input "$file" --stage 3
done
Quick Review Cycle
Generate and open comparison report immediately after correction:
# Run corrections
python scripts/fix_transcription.py --input meeting.md --stage 3
# Generate and open diff report
python scripts/generate_diff_report.py \
--original "meeting.md" \
--stage1 "meeting_阶段1_词典修复.md" \
--stage2 "meeting_阶段2_AI修复.md"
open meeting_对比.html # macOS
# xdg-open meeting_对比.html # Linux
# start meeting_对比.html # Windows