feat(transcript-fixer): add timestamp repair and section splitting scripts
New scripts: - fix_transcript_timestamps.py: Repair malformed timestamps (HH:MM:SS format) - split_transcript_sections.py: Split transcript by keywords and rebase timestamps - Automated tests for both scripts Features: - Timestamp validation and repair (handle missing colons, invalid ranges) - Section splitting with custom names - Rebase timestamps to 00:00:00 for each section - Preserve speaker format and content integrity - In-place editing with backup Documentation updates: - Add usage examples to SKILL.md - Clarify dictionary iteration workflow (save stable patterns only) - Update workflow guides with new script references - Add script parameter documentation Use cases: - Fix ASR output with broken timestamps - Split long meetings into focused sections - Prepare sections for independent processing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -17,6 +17,7 @@ Detailed step-by-step workflows for transcript correction and management.
|
||||
- [5. Stage-by-Stage Execution](#5-stage-by-stage-execution)
|
||||
- [6. Context-Aware Rules](#6-context-aware-rules)
|
||||
- [7. Diff Report Generation](#7-diff-report-generation)
|
||||
- [8. Workshop Transcript Split + Timestamp Rebase](#8-workshop-transcript-split--timestamp-rebase)
|
||||
- [Batch Processing](#batch-processing)
|
||||
- [Process Multiple Files](#process-multiple-files)
|
||||
- [Parallel Processing](#parallel-processing)
|
||||
@@ -400,6 +401,30 @@ See `file_formats.md` for context_rules schema.
|
||||
|
||||
See `script_parameters.md` for advanced diff options.
|
||||
|
||||
### 8. Workshop Transcript Split + Timestamp Rebase
|
||||
|
||||
**Goal**: Split a long workshop transcript into sections such as setup chat, class, and debrief, then make each section start from `00:00:00`.
|
||||
|
||||
**Steps**:
|
||||
|
||||
1. **Correct transcript text first** (dictionary + AI/manual review)
|
||||
2. **Pick marker phrases** for each section boundary
|
||||
3. **Split and rebase**:
|
||||
|
||||
```bash
|
||||
uv run scripts/split_transcript_sections.py workshop.txt \
|
||||
--first-section-name "课前聊天" \
|
||||
--section "正式上课::好,无缝切换嘛。对。那个曹总连上了吗?那个网页。" \
|
||||
--section "课后复盘::我们复盘一下。" \
|
||||
--rebase-to-zero
|
||||
```
|
||||
|
||||
4. **If you already split the files**, rebase a single file directly:
|
||||
|
||||
```bash
|
||||
uv run scripts/fix_transcript_timestamps.py class.txt --in-place --rebase-to-zero
|
||||
```
|
||||
|
||||
## Batch Processing
|
||||
|
||||
### Process Multiple Files
|
||||
|
||||
Reference in New Issue
Block a user