## New Skill: transcript-fixer v1.0.0 Correct speech-to-text (ASR/STT) transcription errors through dictionary-based rules and AI-powered corrections with automatic pattern learning. **Features:** - Two-stage correction pipeline (dictionary + AI) - Automatic pattern detection and learning - Domain-specific dictionaries (general, embodied_ai, finance, medical) - SQLite-based correction repository - Team collaboration with import/export - GLM API integration for AI corrections - Cost optimization through dictionary promotion **Use cases:** - Correcting meeting notes, lecture recordings, or interview transcripts - Fixing Chinese/English homophone errors and technical terminology - Building domain-specific correction dictionaries - Improving transcript accuracy through iterative learning **Documentation:** - Complete workflow guides in references/ - SQL query templates - Troubleshooting guide - Team collaboration patterns - API setup instructions **Marketplace updates:** - Updated marketplace to v1.8.0 - Added transcript-fixer plugin (category: productivity) - Updated README.md with skill description and use cases - Updated CLAUDE.md with skill listing and counts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
26 KiB
Architecture Reference
Technical implementation details of the transcript-fixer system.
Table of Contents
- Module Structure
- Design Principles
- Module Architecture
- Data Flow
- SQLite Architecture (v2.0)
- Module Details
- State Management
- Error Handling Strategy
- Testing Strategy
- Performance Considerations
- Security Architecture
- Extensibility Points
- Dependencies
- Deployment
- Further Reading
Module Structure
The codebase follows a modular package structure for maintainability:
scripts/
├── fix_transcription.py # Main entry point (~70 lines)
├── core/ # Business logic & data access
│ ├── correction_repository.py # Data access layer (466 lines)
│ ├── correction_service.py # Business logic layer (525 lines)
│ ├── schema.sql # SQLite database schema (216 lines)
│ ├── dictionary_processor.py # Stage 1 processor (140 lines)
│ ├── ai_processor.py # Stage 2 processor (199 lines)
│ └── learning_engine.py # Pattern detection (252 lines)
├── cli/ # Command-line interface
│ ├── commands.py # Command handlers (180 lines)
│ └── argument_parser.py # Argument config (95 lines)
└── utils/ # Utility functions
├── diff_generator.py # Multi-format diffs (132 lines)
├── logging_config.py # Logging configuration (130 lines)
└── validation.py # SQLite validation (105 lines)
Benefits of modular structure:
- Clear separation of concerns (business logic / CLI / utilities)
- Easy to locate and modify specific functionality
- Supports independent testing of modules
- Scales well as codebase grows
- Follows Python package best practices
Design Principles
SOLID Compliance
Every module follows SOLID principles for maintainability:
-
Single Responsibility Principle (SRP)
- Each module has exactly one reason to change
CorrectionRepository: Database operations onlyCorrectionService: Business logic and validation onlyDictionaryProcessor: Text transformation onlyAIProcessor: API communication onlyLearningEngine: Pattern analysis only
-
Open/Closed Principle (OCP)
- Open for extension via SQL INSERT
- Closed for modification (no code changes needed)
- Add corrections via CLI or SQL without editing Python
-
Liskov Substitution Principle (LSP)
- All processors implement same interface
- Can swap implementations without breaking workflow
-
Interface Segregation Principle (ISP)
- Repository, Service, Processor, Engine are independent
- No unnecessary dependencies
-
Dependency Inversion Principle (DIP)
- Service depends on Repository interface
- CLI depends on Service interface
- Not tied to concrete implementations
File Length Limits
All files comply with code quality standards:
| File | Lines | Limit | Status |
|---|---|---|---|
validation.py |
105 | 200 | ✅ |
logging_config.py |
130 | 200 | ✅ |
diff_generator.py |
132 | 200 | ✅ |
dictionary_processor.py |
140 | 200 | ✅ |
commands.py |
180 | 200 | ✅ |
ai_processor.py |
199 | 250 | ✅ |
schema.sql |
216 | 250 | ✅ |
learning_engine.py |
252 | 250 | ✅ |
correction_repository.py |
466 | 500 | ✅ |
correction_service.py |
525 | 550 | ✅ |
Module Architecture
Layer Diagram
┌─────────────────────────────────────────┐
│ CLI Layer (fix_transcription.py) │
│ - Argument parsing │
│ - Command routing │
│ - User interaction │
└───────────────┬─────────────────────────┘
│
┌───────────────▼─────────────────────────┐
│ Business Logic Layer │
│ │
│ ┌──────────────────┐ ┌──────────────┐│
│ │ Dictionary │ │ AI ││
│ │ Processor │ │ Processor ││
│ │ (Stage 1) │ │ (Stage 2) ││
│ └──────────────────┘ └──────────────┘│
│ │
│ ┌──────────────────┐ ┌──────────────┐│
│ │ Learning │ │ Diff ││
│ │ Engine │ │ Generator ││
│ │ (Pattern detect) │ │ (Stage 3) ││
│ └──────────────────┘ └──────────────┘│
└───────────────┬─────────────────────────┘
│
┌───────────────▼─────────────────────────┐
│ Data Access Layer (SQLite-based) │
│ │
│ ┌──────────────────────────────────┐ │
│ │ CorrectionManager (Facade) │ │
│ │ - Backward-compatible API │ │
│ └──────────────┬───────────────────┘ │
│ │ │
│ ┌──────────────▼───────────────────┐ │
│ │ CorrectionService │ │
│ │ - Business logic │ │
│ │ - Validation │ │
│ │ - Import/Export │ │
│ └──────────────┬───────────────────┘ │
│ │ │
│ ┌──────────────▼───────────────────┐ │
│ │ CorrectionRepository │ │
│ │ - ACID transactions │ │
│ │ - Thread-safe connections │ │
│ │ - Audit logging │ │
│ └──────────────────────────────────┘ │
└───────────────┬─────────────────────────┘
│
┌───────────────▼─────────────────────────┐
│ Storage Layer │
│ ~/.transcript-fixer/corrections.db │
│ - SQLite database (ACID compliant) │
│ - 8 normalized tables + 3 views │
│ - Comprehensive indexes │
│ - Foreign key constraints │
└─────────────────────────────────────────┘
Data Flow
Correction Workflow
1. User Input
↓
2. fix_transcription.py (Orchestrator)
↓
3. CorrectionService.get_corrections()
← Query from ~/.transcript-fixer/corrections.db
↓
4. DictionaryProcessor.process()
- Apply context rules (regex)
- Apply dictionary replacements
- Track changes
↓
5. AIProcessor.process()
- Split into chunks
- Call GLM-4.6 API
- Retry with fallback on error
- Track AI changes
↓
6. CorrectionService.save_history()
→ Insert into correction_history table
↓
7. LearningEngine.analyze_and_suggest()
- Query correction_history table
- Detect patterns (frequency ≥3, confidence ≥80%)
- Generate suggestions
→ Insert into learned_suggestions table
↓
8. Output Files
- {filename}_stage1.md
- {filename}_stage2.md
Learning Cycle
Run 1: meeting1.md
AI corrects: "巨升" → "具身"
↓
INSERT INTO correction_history
Run 2: meeting2.md
AI corrects: "巨升" → "具身"
↓
INSERT INTO correction_history
Run 3: meeting3.md
AI corrects: "巨升" → "具身"
↓
INSERT INTO correction_history
↓
LearningEngine queries patterns:
- SELECT ... GROUP BY from_text, to_text
- Frequency: 3, Confidence: 100%
↓
INSERT INTO learned_suggestions (status='pending')
↓
User reviews: --review-learned
↓
User approves: --approve "巨升" "具身"
↓
INSERT INTO corrections (source='learned')
UPDATE learned_suggestions (status='approved')
↓
Future runs query corrections table (Stage 1 - faster!)
SQLite Architecture (v2.0)
Two-Layer Data Access (Simplified)
Design Principle: No users = no backward compatibility overhead.
The system uses a clean 2-layer architecture:
┌──────────────────────────────────────────┐
│ CLI Commands (commands.py) │
│ - User interaction │
│ - Command routing │
└──────────────┬───────────────────────────┘
│
┌──────────────▼───────────────────────────┐
│ CorrectionService (Business Logic) │
│ - Input validation & sanitization │
│ - Business rules enforcement │
│ - Import/export orchestration │
│ - Statistics calculation │
│ - History tracking │
└──────────────┬───────────────────────────┘
│
┌──────────────▼───────────────────────────┐
│ CorrectionRepository (Data Access) │
│ - ACID transactions │
│ - Thread-safe connections │
│ - SQL query execution │
│ - Audit logging │
└──────────────┬───────────────────────────┘
│
┌──────────────▼───────────────────────────┐
│ SQLite Database (corrections.db) │
│ - 8 normalized tables │
│ - Foreign key constraints │
│ - Comprehensive indexes │
│ - 3 views for common queries │
└───────────────────────────────────────────┘
Database Schema (schema.sql)
Core Tables:
-
corrections (main correction storage)
- Primary key: id
- Unique constraint: (from_text, domain)
- Indexes: domain, source, added_at, is_active, from_text
- Fields: confidence (0.0-1.0), usage_count, notes
-
context_rules (regex-based rules)
- Pattern + replacement with priority ordering
- Indexes: priority (DESC), is_active
-
correction_history (audit trail for runs)
- Tracks: filename, domain, timestamps, change counts
- Links to correction_changes via foreign key
- Indexes: run_timestamp, domain, success
-
correction_changes (detailed change log)
- Links to history via foreign key (CASCADE delete)
- Stores: line_number, from/to text, rule_type, context
- Indexes: history_id, rule_type
-
learned_suggestions (AI-detected patterns)
- Status: pending → approved/rejected
- Unique constraint: (from_text, to_text, domain)
- Fields: frequency, confidence, timestamps
- Indexes: status, domain, confidence, frequency
-
suggestion_examples (occurrences of patterns)
- Links to learned_suggestions via foreign key
- Stores context where pattern occurred
-
system_config (configuration storage)
- Key-value store with type safety
- Stores: API settings, thresholds, defaults
-
audit_log (comprehensive audit trail)
- Tracks all database operations
- Fields: action, entity_type, entity_id, user, success
- Indexes: timestamp, action, entity_type, success
Views (for common queries):
active_corrections: Active corrections onlypending_suggestions: Suggestions pending reviewcorrection_statistics: Statistics per domain
ACID Guarantees
Atomicity: All-or-nothing transactions
with self._transaction() as conn:
conn.execute("INSERT ...") # Either all succeed
conn.execute("UPDATE ...") # or all rollback
Consistency: Constraints enforced
- Foreign key constraints
- Check constraints (confidence 0.0-1.0, usage_count ≥ 0)
- Unique constraints
Isolation: Serializable transactions
conn.execute("BEGIN IMMEDIATE") # Acquire write lock
Durability: Changes persisted to disk
- SQLite guarantees persistence after commit
- Backup before migrations
Thread Safety
Thread-local connections:
def _get_connection(self):
if not hasattr(self._local, 'connection'):
self._local.connection = sqlite3.connect(...)
return self._local.connection
Connection pooling:
- One connection per thread
- Automatic cleanup on close
- Foreign keys enabled per connection
Clean Architecture (No Legacy)
Design Philosophy:
- Clean 2-layer architecture (Service → Repository)
- No backward compatibility overhead
- Direct API design without legacy constraints
- YAGNI principle: Build for current needs, not hypothetical migrations
Module Details
fix_transcription.py (Orchestrator)
Responsibilities:
- Parse CLI arguments
- Route commands to appropriate handlers
- Coordinate workflow between modules
- Display user feedback
Key Functions:
cmd_init() # Initialize ~/.transcript-fixer/
cmd_add_correction() # Add single correction
cmd_list_corrections() # List corrections
cmd_run_correction() # Execute correction workflow
cmd_review_learned() # Review AI suggestions
cmd_approve() # Approve learned correction
Design Pattern: Command pattern with function routing
correction_repository.py (Data Access Layer)
Responsibilities:
- Execute SQL queries with ACID guarantees
- Manage thread-safe database connections
- Handle transactions (commit/rollback)
- Perform audit logging
- Convert between database rows and Python objects
Key Methods:
add_correction() # INSERT with UNIQUE handling
get_correction() # SELECT single correction
get_all_corrections() # SELECT with filters
get_corrections_dict() # For backward compatibility
update_correction() # UPDATE with transaction
delete_correction() # Soft delete (is_active=0)
increment_usage() # Track usage statistics
bulk_import_corrections() # Batch INSERT with conflict resolution
Transaction Management:
@contextmanager
def _transaction(self):
conn = self._get_connection()
try:
conn.execute("BEGIN IMMEDIATE")
yield conn
conn.commit()
except Exception:
conn.rollback()
raise
correction_service.py (Business Logic Layer)
Responsibilities:
- Input validation and sanitization
- Business rule enforcement
- Orchestrate repository operations
- Import/export with conflict detection
- Statistics calculation
Key Methods:
# Validation
validate_correction_text() # Check length, control chars, NULL bytes
validate_domain_name() # Prevent path traversal, injection
validate_confidence() # Range check (0.0-1.0)
validate_source() # Enum validation
# Operations
add_correction() # Validate + repository.add
get_corrections() # Get corrections for domain
remove_correction() # Validate + repository.delete
# Import/Export
import_corrections() # Pre-validate + bulk import + conflict detection
export_corrections() # Query + format as JSON
# Analytics
get_statistics() # Calculate metrics per domain
Validation Rules:
@dataclass
class ValidationRules:
max_text_length: int = 1000
min_text_length: int = 1
max_domain_length: int = 50
allowed_domain_pattern: str = r'^[a-zA-Z0-9_-]+$'
CLI Integration (commands.py)
Direct Service Usage:
def _get_service():
"""Get configured CorrectionService instance."""
config_dir = Path.home() / ".transcript-fixer"
db_path = config_dir / "corrections.db"
repository = CorrectionRepository(db_path)
return CorrectionService(repository)
def cmd_add_correction(args):
service = _get_service()
service.add_correction(args.from_text, args.to_text, args.domain)
Benefits of Direct Integration:
- No unnecessary abstraction layers
- Clear data flow: CLI → Service → Repository
- Easy to understand and debug
- Performance: One less function call per operation
dictionary_processor.py (Stage 1)
Responsibilities:
- Apply context-aware regex rules
- Apply simple dictionary replacements
- Track all changes with line numbers
Processing Order:
- Context rules first (higher priority)
- Dictionary replacements second
Key Methods:
process(text) -> (corrected_text, changes)
_apply_context_rules()
_apply_dictionary()
get_summary(changes)
Change Tracking:
@dataclass
class Change:
line_number: int
from_text: str
to_text: str
rule_type: str # "dictionary" or "context_rule"
rule_name: str
ai_processor.py (Stage 2)
Responsibilities:
- Split text into API-friendly chunks
- Call GLM-4.6 API
- Handle retries with fallback model
- Track AI-suggested changes
Key Methods:
process(text, context) -> (corrected_text, changes)
_split_into_chunks() # Respect paragraph boundaries
_process_chunk() # Single API call
_build_prompt() # Construct correction prompt
Chunking Strategy:
- Max 6000 characters per chunk
- Split on paragraph boundaries (
\n\n) - If paragraph too long, split on sentences
- Preserve context across chunks
Error Handling:
- Retry with fallback model (GLM-4.5-Air)
- If both fail, use original text
- Never lose user's data
learning_engine.py (Pattern Detection)
Responsibilities:
- Analyze correction history
- Detect recurring patterns
- Calculate confidence scores
- Generate suggestions for review
- Track rejected suggestions
Algorithm:
1. Query correction_history table
2. Extract stage2 (AI) changes
3. Group by pattern (from→to)
4. Count frequency
5. Calculate confidence
6. Filter by thresholds:
- frequency ≥ 3
- confidence ≥ 0.8
7. Save to learned/pending_review.json
Confidence Calculation:
confidence = (
0.5 * frequency_score + # More occurrences = higher
0.3 * consistency_score + # Always same correction
0.2 * recency_score # Recent = higher
)
Key Methods:
analyze_and_suggest() # Main analysis pipeline
approve_suggestion() # Move to corrections.json
reject_suggestion() # Move to rejected.json
list_pending() # Get all suggestions
diff_generator.py (Stage 3)
Responsibilities:
- Generate comparison reports
- Multiple output formats
- Word-level diff analysis
Output Formats:
- Markdown summary (statistics + change list)
- Unified diff (standard diff format)
- HTML side-by-side (visual comparison)
- Inline marked ([-old-] [+new+])
Not Modified: Kept original 338-line file as-is (working well)
State Management
Database-Backed State
- All state stored in
~/.transcript-fixer/corrections.db - SQLite handles caching and transactions
- ACID guarantees prevent corruption
- Backup created before migrations
Thread-Safe Access
- Thread-local connections (one per thread)
- BEGIN IMMEDIATE for write transactions
- No global state or shared mutable data
- Each operation is independent (stateless modules)
Soft Deletes
- Records marked inactive (is_active=0) instead of DELETE
- Preserves audit trail
- Can be reactivated if needed
Error Handling Strategy
Fail Fast for User Errors
if not skill_path.exists():
print(f"❌ Error: Skill directory not found")
sys.exit(1)
Retry for Transient Errors
try:
api_call(model_primary)
except Exception:
try:
api_call(model_fallback)
except Exception:
use_original_text()
Backup Before Destructive Operations
if target_file.exists():
shutil.copy2(target_file, backup_file)
# Then overwrite target_file
Testing Strategy
Unit Testing (Recommended)
# Test dictionary processor
def test_dictionary_processor():
corrections = {"错误": "正确"}
processor = DictionaryProcessor(corrections, [])
text = "这是错误的文本"
result, changes = processor.process(text)
assert result == "这是正确的文本"
assert len(changes) == 1
# Test learning engine thresholds
def test_learning_thresholds():
engine = LearningEngine(history_dir, learned_dir)
# Create mock history with pattern appearing 3+ times
suggestions = engine.analyze_and_suggest()
assert len(suggestions) > 0
Integration Testing
# End-to-end test
python fix_transcription.py --init
python fix_transcription.py --add "test" "TEST"
python fix_transcription.py --input test.md --stage 3
# Verify output files exist
Performance Considerations
Bottlenecks
- AI API calls: Slowest part (60s timeout per chunk)
- File I/O: Negligible (JSON files are small)
- Pattern matching: Fast (regex + dict lookups)
Optimization Strategies
- Stage 1 First: Test dictionary corrections before expensive AI calls
- Chunking: Process large files in parallel chunks (future enhancement)
- Caching: Could cache API results by content hash (future enhancement)
Scalability
Current capabilities (v2.0 with SQLite):
- File size: Unlimited (chunks handle large files)
- Corrections: Tested up to 100,000 entries (with indexes)
- History: Unlimited (database handles efficiently)
- Concurrent access: Thread-safe with ACID guarantees
- Query performance: O(log n) with B-tree indexes
Performance improvements from SQLite:
- Indexed queries (domain, source, added_at)
- Views for common aggregations
- Batch imports with transactions
- Soft deletes (no data loss)
Future improvements:
- Parallel chunk processing for AI calls
- API response caching
- Full-text search for corrections
Security Architecture
Secret Management
- API keys via environment variables only
- Never hardcode credentials
- Security scanner enforces this
Backup Security
.bakfiles same permissions as originals- No encryption (user's responsibility)
- Recommendation: Use encrypted filesystems
Git Security
.gitignorefor.bakfiles- Private repos recommended
- Security scan before commits
Extensibility Points
Adding New Processors
- Create new processor class
- Implement
process(text) -> (result, changes)interface - Add to orchestrator workflow
Example:
class SpellCheckProcessor:
def process(self, text):
# Custom spell checking logic
return corrected_text, changes
Adding New Learning Algorithms
- Subclass
LearningEngine - Override
_calculate_confidence() - Adjust thresholds as needed
Adding New Export Formats
- Add method to
CorrectionManager - Support new file format
- Add CLI command
Dependencies
Required
- Python 3.8+ (
from __future__ import annotations) httpx(for API calls)
Optional
diffcommand (for unified diffs)- Git (for version control)
Development
pytest(for testing)black(for formatting)mypy(for type checking)
Deployment
User Installation
# 1. Clone or download skill to workspace
git clone <repo> transcript-fixer
cd transcript-fixer
# 2. Install dependencies
pip install -r requirements.txt
# 3. Initialize
python scripts/fix_transcription.py --init
# 4. Set API key
export GLM_API_KEY="KEY_VALUE"
# Ready to use!
CI/CD Pipeline (Future)
# Potential GitHub Actions workflow
test:
- Install dependencies
- Run unit tests
- Run integration tests
- Check code style (black, mypy)
security:
- Run security_scan.py
- Check for secrets
deploy:
- Package skill
- Upload to skill marketplace
Further Reading
- SOLID Principles: https://en.wikipedia.org/wiki/SOLID
- API Patterns:
references/glm_api_setup.md - File Formats:
references/file_formats.md - Testing: https://docs.pytest.org/