feat(transcript-fixer): add timestamp repair and section splitting scripts

New scripts:
- fix_transcript_timestamps.py: Repair malformed timestamps (HH:MM:SS format)
- split_transcript_sections.py: Split transcript by keywords and rebase timestamps
- Automated tests for both scripts

Features:
- Timestamp validation and repair (handle missing colons, invalid ranges)
- Section splitting with custom names
- Rebase timestamps to 00:00:00 for each section
- Preserve speaker format and content integrity
- In-place editing with backup

Documentation updates:
- Add usage examples to SKILL.md
- Clarify dictionary iteration workflow (save stable patterns only)
- Update workflow guides with new script references
- Add script parameter documentation

Use cases:
- Fix ASR output with broken timestamps
- Split long meetings into focused sections
- Prepare sections for independent processing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
daymade
2026-03-11 13:59:36 +08:00
parent 29f85d27c3
commit 135a1873af
8 changed files with 688 additions and 4 deletions

View File

@@ -16,7 +16,7 @@ The core value of transcript-fixer is building a personalized correction diction
└─────────────────────────────────────────────────┘
```
**Key principle**: Every correction you make should be saved to the dictionary. This transforms one-time work into permanent value.
**Key principle**: Every stable, reusable ASR correction you make should be saved to the dictionary. This transforms one-time work into permanent value without polluting the database.
## Workflow Checklist
@@ -34,7 +34,7 @@ Correction Progress:
## Save Corrections Immediately
After fixing any transcript, save corrections:
After fixing any transcript, save stable corrections:
```bash
# Single correction
@@ -122,3 +122,12 @@ Patterns appearing ≥3 times at ≥80% confidence are suggested for review.
3. **Use domains**: Organize corrections by topic for better precision
4. **Verify**: Always run --list to confirm saves
5. **Review suggestions**: Periodically check --review-learned for auto-detected patterns
## What NOT to Save to Dictionary
Do **not** save these as reusable dictionary entries:
- Full-sentence deletions
- One-off section headers or meeting-specific boilerplate
- Context-only disambiguations such as `cloud -> Claude` when `cloud` can also be legitimate
- File-local cleanup after section splitting or timestamp rebasing