## New Skill: transcript-fixer v1.0.0 Correct speech-to-text (ASR/STT) transcription errors through dictionary-based rules and AI-powered corrections with automatic pattern learning. **Features:** - Two-stage correction pipeline (dictionary + AI) - Automatic pattern detection and learning - Domain-specific dictionaries (general, embodied_ai, finance, medical) - SQLite-based correction repository - Team collaboration with import/export - GLM API integration for AI corrections - Cost optimization through dictionary promotion **Use cases:** - Correcting meeting notes, lecture recordings, or interview transcripts - Fixing Chinese/English homophone errors and technical terminology - Building domain-specific correction dictionaries - Improving transcript accuracy through iterative learning **Documentation:** - Complete workflow guides in references/ - SQL query templates - Troubleshooting guide - Team collaboration patterns - API setup instructions **Marketplace updates:** - Updated marketplace to v1.8.0 - Added transcript-fixer plugin (category: productivity) - Updated README.md with skill description and use cases - Updated CLAUDE.md with skill listing and counts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
372 lines
9.9 KiB
Markdown
372 lines
9.9 KiB
Markdown
# Team Collaboration Guide
|
|
|
|
This guide explains how to share correction knowledge across teams using export/import and Git workflows.
|
|
|
|
## Table of Contents
|
|
|
|
- [Export/Import Workflow](#exportimport-workflow)
|
|
- [Export Corrections](#export-corrections)
|
|
- [Import from Teammate](#import-from-teammate)
|
|
- [Team Workflow Example](#team-workflow-example)
|
|
- [Git-Based Collaboration](#git-based-collaboration)
|
|
- [Initial Setup](#initial-setup)
|
|
- [Team Members Clone](#team-members-clone)
|
|
- [Ongoing Sync](#ongoing-sync)
|
|
- [Handling Conflicts](#handling-conflicts)
|
|
- [Selective Domain Sharing](#selective-domain-sharing)
|
|
- [Finance Team](#finance-team)
|
|
- [AI Team](#ai-team)
|
|
- [Individual imports specific domains](#individual-imports-specific-domains)
|
|
- [Git Branching Strategy](#git-branching-strategy)
|
|
- [Feature Branches](#feature-branches)
|
|
- [Domain Branches (Alternative)](#domain-branches-alternative)
|
|
- [Automated Sync (Advanced)](#automated-sync-advanced)
|
|
- [macOS/Linux Cron](#macoslinux-cron)
|
|
- [Windows Task Scheduler](#windows-task-scheduler)
|
|
- [Backup and Recovery](#backup-and-recovery)
|
|
- [Backup Strategy](#backup-strategy)
|
|
- [Recovery from Backup](#recovery-from-backup)
|
|
- [Recovery from Git](#recovery-from-git)
|
|
- [Team Best Practices](#team-best-practices)
|
|
- [Integration with CI/CD](#integration-with-cicd)
|
|
- [GitHub Actions Example](#github-actions-example)
|
|
- [Troubleshooting](#troubleshooting)
|
|
- [Import Failed](#import-failed)
|
|
- [Git Sync Failed](#git-sync-failed)
|
|
- [Merge Conflicts Too Complex](#merge-conflicts-too-complex)
|
|
- [Security Considerations](#security-considerations)
|
|
- [Further Reading](#further-reading)
|
|
|
|
## Export/Import Workflow
|
|
|
|
### Export Corrections
|
|
|
|
Share your corrections with team members:
|
|
|
|
```bash
|
|
# Export specific domain
|
|
python scripts/fix_transcription.py --export team_corrections.json --domain embodied_ai
|
|
|
|
# Export general corrections
|
|
python scripts/fix_transcription.py --export team_corrections.json
|
|
```
|
|
|
|
**Output**: Creates a standalone JSON file with your corrections.
|
|
|
|
### Import from Teammate
|
|
|
|
Two modes: **merge** (combine) or **replace** (overwrite):
|
|
|
|
```bash
|
|
# Merge (recommended) - combines with existing corrections
|
|
python scripts/fix_transcription.py --import team_corrections.json --merge
|
|
|
|
# Replace - overwrites existing corrections (dangerous!)
|
|
python scripts/fix_transcription.py --import team_corrections.json
|
|
```
|
|
|
|
**Merge behavior**:
|
|
- Adds new corrections
|
|
- Updates existing corrections with imported values
|
|
- Preserves corrections not in import file
|
|
|
|
### Team Workflow Example
|
|
|
|
**Person A (Domain Expert)**:
|
|
```bash
|
|
# Build correction dictionary
|
|
python fix_transcription.py --add "巨升" "具身" --domain embodied_ai
|
|
python fix_transcription.py --add "奇迹创坛" "奇绩创坛" --domain embodied_ai
|
|
# ... add 50 more corrections ...
|
|
|
|
# Export for team
|
|
python fix_transcription.py --export ai_corrections.json --domain embodied_ai
|
|
# Send ai_corrections.json to team via Slack/email
|
|
```
|
|
|
|
**Person B (Team Member)**:
|
|
```bash
|
|
# Receive ai_corrections.json
|
|
# Import and merge with existing corrections
|
|
python fix_transcription.py --import ai_corrections.json --merge
|
|
|
|
# Now Person B has all 50+ corrections!
|
|
```
|
|
|
|
## Git-Based Collaboration
|
|
|
|
For teams using Git, version control the entire correction database.
|
|
|
|
### Initial Setup
|
|
|
|
**Person A (First User)**:
|
|
```bash
|
|
cd ~/.transcript-fixer
|
|
git init
|
|
git add corrections.json context_rules.json config.json
|
|
git add domains/
|
|
git commit -m "Initial correction database"
|
|
|
|
# Push to shared repo
|
|
git remote add origin git@github.com:org/transcript-corrections.git
|
|
git push -u origin main
|
|
```
|
|
|
|
### Team Members Clone
|
|
|
|
**Person B, C, D (Team Members)**:
|
|
```bash
|
|
# Clone shared corrections
|
|
git clone git@github.com:org/transcript-corrections.git ~/.transcript-fixer
|
|
|
|
# Now everyone has the same corrections!
|
|
```
|
|
|
|
### Ongoing Sync
|
|
|
|
**Daily workflow**:
|
|
```bash
|
|
# Morning: Pull team updates
|
|
cd ~/.transcript-fixer
|
|
git pull origin main
|
|
|
|
# During day: Add corrections
|
|
python fix_transcription.py --add "错误" "正确"
|
|
|
|
# Evening: Push your additions
|
|
cd ~/.transcript-fixer
|
|
git add corrections.json
|
|
git commit -m "Added 5 new embodied AI corrections"
|
|
git push origin main
|
|
```
|
|
|
|
### Handling Conflicts
|
|
|
|
When two people add different corrections to same file:
|
|
|
|
```bash
|
|
cd ~/.transcript-fixer
|
|
git pull origin main
|
|
|
|
# If conflict occurs:
|
|
# CONFLICT in corrections.json
|
|
|
|
# Option 1: Manual merge (recommended)
|
|
nano corrections.json # Edit to combine both changes
|
|
git add corrections.json
|
|
git commit -m "Merged corrections from teammate"
|
|
git push
|
|
|
|
# Option 2: Keep yours
|
|
git checkout --ours corrections.json
|
|
git add corrections.json
|
|
git commit -m "Kept local corrections"
|
|
git push
|
|
|
|
# Option 3: Keep theirs
|
|
git checkout --theirs corrections.json
|
|
git add corrections.json
|
|
git commit -m "Used teammate's corrections"
|
|
git push
|
|
```
|
|
|
|
**Best Practice**: JSON merge conflicts are usually easy - just combine the correction entries from both versions.
|
|
|
|
## Selective Domain Sharing
|
|
|
|
Share only specific domains with different teams:
|
|
|
|
### Finance Team
|
|
```bash
|
|
# Finance team exports their domain
|
|
python fix_transcription.py --export finance_corrections.json --domain finance
|
|
|
|
# Share finance_corrections.json with finance team only
|
|
```
|
|
|
|
### AI Team
|
|
```bash
|
|
# AI team exports their domain
|
|
python fix_transcription.py --export ai_corrections.json --domain embodied_ai
|
|
|
|
# Share ai_corrections.json with AI team only
|
|
```
|
|
|
|
### Individual imports specific domains
|
|
```bash
|
|
# Alice works on both finance and AI
|
|
python fix_transcription.py --import finance_corrections.json --merge
|
|
python fix_transcription.py --import ai_corrections.json --merge
|
|
```
|
|
|
|
## Git Branching Strategy
|
|
|
|
For larger teams, use branches for different domains or workflows:
|
|
|
|
### Feature Branches
|
|
```bash
|
|
# Create branch for major dictionary additions
|
|
git checkout -b add-medical-terms
|
|
python fix_transcription.py --add "医疗术语" "正确术语" --domain medical
|
|
# ... add 100 medical corrections ...
|
|
git add domains/medical.json
|
|
git commit -m "Added 100 medical terminology corrections"
|
|
git push origin add-medical-terms
|
|
|
|
# Create PR for review
|
|
# After approval, merge to main
|
|
```
|
|
|
|
### Domain Branches (Alternative)
|
|
```bash
|
|
# Separate branches per domain
|
|
git checkout -b domain/embodied-ai
|
|
# Work on AI corrections
|
|
git push origin domain/embodied-ai
|
|
|
|
git checkout -b domain/finance
|
|
# Work on finance corrections
|
|
git push origin domain/finance
|
|
```
|
|
|
|
## Automated Sync (Advanced)
|
|
|
|
Set up automatic Git sync using cron/Task Scheduler:
|
|
|
|
### macOS/Linux Cron
|
|
```bash
|
|
# Edit crontab
|
|
crontab -e
|
|
|
|
# Add daily sync at 9 AM and 6 PM
|
|
0 9,18 * * * cd ~/.transcript-fixer && git pull origin main && git push origin main
|
|
```
|
|
|
|
### Windows Task Scheduler
|
|
```powershell
|
|
# Create scheduled task
|
|
$action = New-ScheduledTaskAction -Execute "git" -Argument "pull origin main" -WorkingDirectory "$env:USERPROFILE\.transcript-fixer"
|
|
$trigger = New-ScheduledTaskTrigger -Daily -At 9am
|
|
Register-ScheduledTask -Action $action -Trigger $trigger -TaskName "SyncTranscriptCorrections"
|
|
```
|
|
|
|
## Backup and Recovery
|
|
|
|
### Backup Strategy
|
|
```bash
|
|
# Weekly backup to cloud
|
|
cd ~/.transcript-fixer
|
|
tar -czf transcript-corrections-$(date +%Y%m%d).tar.gz corrections.json context_rules.json domains/
|
|
# Upload to Dropbox/Google Drive/S3
|
|
```
|
|
|
|
### Recovery from Backup
|
|
```bash
|
|
# Extract backup
|
|
tar -xzf transcript-corrections-20250127.tar.gz -C ~/.transcript-fixer/
|
|
```
|
|
|
|
### Recovery from Git
|
|
```bash
|
|
# View history
|
|
cd ~/.transcript-fixer
|
|
git log corrections.json
|
|
|
|
# Restore from 3 commits ago
|
|
git checkout HEAD~3 corrections.json
|
|
|
|
# Or restore specific version
|
|
git checkout abc123def corrections.json
|
|
```
|
|
|
|
## Team Best Practices
|
|
|
|
1. **Pull Before Push**: Always `git pull` before starting work
|
|
2. **Commit Often**: Small, frequent commits better than large infrequent ones
|
|
3. **Descriptive Messages**: "Added 5 finance terms" better than "updates"
|
|
4. **Review Process**: Use PRs for major dictionary changes (100+ corrections)
|
|
5. **Domain Ownership**: Assign domain experts as reviewers
|
|
6. **Weekly Sync**: Schedule team sync meetings to review learned suggestions
|
|
7. **Backup Policy**: Weekly backups of entire `~/.transcript-fixer/`
|
|
|
|
## Integration with CI/CD
|
|
|
|
For enterprise teams, integrate validation into CI:
|
|
|
|
### GitHub Actions Example
|
|
```yaml
|
|
# .github/workflows/validate-corrections.yml
|
|
name: Validate Corrections
|
|
|
|
on:
|
|
pull_request:
|
|
paths:
|
|
- 'corrections.json'
|
|
- 'domains/*.json'
|
|
|
|
jobs:
|
|
validate:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v2
|
|
|
|
- name: Validate JSON
|
|
run: |
|
|
python -m json.tool corrections.json > /dev/null
|
|
for file in domains/*.json; do
|
|
python -m json.tool "$file" > /dev/null
|
|
done
|
|
|
|
- name: Check for duplicates
|
|
run: |
|
|
python scripts/check_duplicates.py corrections.json
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Import Failed
|
|
```bash
|
|
# Check JSON validity
|
|
python -m json.tool team_corrections.json
|
|
|
|
# If invalid, fix JSON syntax errors
|
|
nano team_corrections.json
|
|
```
|
|
|
|
### Git Sync Failed
|
|
```bash
|
|
# Check remote connection
|
|
git remote -v
|
|
|
|
# Re-add if needed
|
|
git remote set-url origin git@github.com:org/corrections.git
|
|
|
|
# Verify SSH keys
|
|
ssh -T git@github.com
|
|
```
|
|
|
|
### Merge Conflicts Too Complex
|
|
```bash
|
|
# Nuclear option: Keep one version
|
|
git checkout --ours corrections.json # Keep yours
|
|
# OR
|
|
git checkout --theirs corrections.json # Keep theirs
|
|
|
|
# Then re-import the other version
|
|
python fix_transcription.py --import other_version.json --merge
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
1. **Private Repos**: Use private Git repositories for company-specific corrections
|
|
2. **Access Control**: Limit who can push to main branch
|
|
3. **Secret Scanning**: Never commit API keys (already handled by security_scan.py)
|
|
4. **Audit Trail**: Git history provides full audit trail of who changed what
|
|
5. **Backup Encryption**: Encrypt backups if containing sensitive terminology
|
|
|
|
## Further Reading
|
|
|
|
- Git workflows: https://git-scm.com/book/en/v2/Git-Branching-Branching-Workflows
|
|
- JSON validation: https://jsonlint.com/
|
|
- Team Git practices: https://github.com/git-guides
|