Files
claude-code-skills-reference/transcript-fixer/references/file_formats.md
daymade bd0aa12004 Release v1.8.0: Add transcript-fixer skill
## New Skill: transcript-fixer v1.0.0

Correct speech-to-text (ASR/STT) transcription errors through dictionary-based rules and AI-powered corrections with automatic pattern learning.

**Features:**
- Two-stage correction pipeline (dictionary + AI)
- Automatic pattern detection and learning
- Domain-specific dictionaries (general, embodied_ai, finance, medical)
- SQLite-based correction repository
- Team collaboration with import/export
- GLM API integration for AI corrections
- Cost optimization through dictionary promotion

**Use cases:**
- Correcting meeting notes, lecture recordings, or interview transcripts
- Fixing Chinese/English homophone errors and technical terminology
- Building domain-specific correction dictionaries
- Improving transcript accuracy through iterative learning

**Documentation:**
- Complete workflow guides in references/
- SQL query templates
- Troubleshooting guide
- Team collaboration patterns
- API setup instructions

**Marketplace updates:**
- Updated marketplace to v1.8.0
- Added transcript-fixer plugin (category: productivity)
- Updated README.md with skill description and use cases
- Updated CLAUDE.md with skill listing and counts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-28 13:16:37 +08:00

11 KiB

Storage Format Reference

This document describes the SQLite database format used by transcript-fixer v2.0.

Table of Contents

Database Location

Path: ~/.transcript-fixer/corrections.db

Type: SQLite 3 database with ACID guarantees

Database Schema

Core Tables

corrections

Main correction dictionary storage.

Column Type Constraints Description
id INTEGER PRIMARY KEY Auto-increment ID
from_text TEXT NOT NULL Original (incorrect) text
to_text TEXT NOT NULL Corrected text
domain TEXT DEFAULT 'general' Correction domain
source TEXT CHECK IN ('manual', 'learned', 'imported') Origin of correction
confidence REAL CHECK 0.0-1.0 Confidence score
added_by TEXT User who added
added_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP When added
usage_count INTEGER DEFAULT 0, CHECK >= 0 Times used
last_used TIMESTAMP Last usage time
notes TEXT Optional notes
is_active BOOLEAN DEFAULT 1 Soft delete flag

Unique Constraint: (from_text, domain)

Indexes: domain, source, added_at, is_active, from_text

context_rules

Regex-based context-aware correction rules.

Column Type Constraints Description
id INTEGER PRIMARY KEY Auto-increment ID
pattern TEXT NOT NULL, UNIQUE Regex pattern
replacement TEXT NOT NULL Replacement text
description TEXT Rule explanation
priority INTEGER DEFAULT 0 Higher = applied first
is_active BOOLEAN DEFAULT 1 Enable/disable
added_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP When added
added_by TEXT User who added

Indexes: priority (DESC), is_active

correction_history

Audit log for all correction runs.

Column Type Constraints Description
id INTEGER PRIMARY KEY Auto-increment ID
filename TEXT NOT NULL File corrected
domain TEXT NOT NULL Domain used
run_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP When run
original_length INTEGER CHECK >= 0 Original file size
stage1_changes INTEGER CHECK >= 0 Dictionary changes
stage2_changes INTEGER CHECK >= 0 AI changes
model TEXT AI model used
execution_time_ms INTEGER Runtime in ms
success BOOLEAN DEFAULT 1 Success flag
error_message TEXT Error if failed

Indexes: run_timestamp (DESC), domain, success

correction_changes

Detailed changes made in each run.

Column Type Constraints Description
id INTEGER PRIMARY KEY Auto-increment ID
history_id INTEGER FOREIGN KEY → correction_history Parent run
line_number INTEGER Line in file
from_text TEXT NOT NULL Original text
to_text TEXT NOT NULL Corrected text
rule_type TEXT CHECK IN ('context', 'dictionary', 'ai') Rule type
rule_id INTEGER Reference to rule
context_before TEXT Text before
context_after TEXT Text after

Foreign Key: history_id → correction_history.id (CASCADE DELETE)

Indexes: history_id, rule_type

learned_suggestions

AI-detected patterns pending review.

Column Type Constraints Description
id INTEGER PRIMARY KEY Auto-increment ID
from_text TEXT NOT NULL Pattern detected
to_text TEXT NOT NULL Suggested correction
domain TEXT DEFAULT 'general' Domain
frequency INTEGER CHECK > 0 Times seen
confidence REAL CHECK 0.0-1.0 Confidence score
first_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP First occurrence
last_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP Last occurrence
status TEXT CHECK IN ('pending', 'approved', 'rejected') Review status
reviewed_at TIMESTAMP When reviewed
reviewed_by TEXT Who reviewed

Unique Constraint: (from_text, to_text, domain)

Indexes: status, domain, confidence (DESC), frequency (DESC)

suggestion_examples

Example occurrences of learned patterns.

Column Type Constraints Description
id INTEGER PRIMARY KEY Auto-increment ID
suggestion_id INTEGER FOREIGN KEY → learned_suggestions Parent suggestion
filename TEXT NOT NULL File where found
line_number INTEGER Line number
context TEXT NOT NULL Surrounding text
occurred_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP When found

Foreign Key: suggestion_id → learned_suggestions.id (CASCADE DELETE)

Index: suggestion_id

system_config

System configuration key-value store.

Column Type Constraints Description
key TEXT PRIMARY KEY Config key
value TEXT NOT NULL Config value
value_type TEXT CHECK IN ('string', 'int', 'float', 'boolean', 'json') Value type
description TEXT Config description
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP Last update

Default Values:

  • schema_version: "2.0"
  • api_provider: "GLM"
  • api_model: "GLM-4.6"
  • default_domain: "general"
  • auto_learn_enabled: "true"
  • learning_frequency_threshold: "3"
  • learning_confidence_threshold: "0.8"

audit_log

Comprehensive audit trail for all operations.

Column Type Constraints Description
id INTEGER PRIMARY KEY Auto-increment ID
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP When occurred
action TEXT NOT NULL Action type
entity_type TEXT NOT NULL Entity affected
entity_id INTEGER Entity ID
user TEXT User who performed
details TEXT Action details
success BOOLEAN DEFAULT 1 Success flag
error_message TEXT Error if failed

Indexes: timestamp (DESC), action, entity_type, success

Views

active_corrections

Quick access to active corrections.

SELECT id, from_text, to_text, domain, source, confidence, usage_count, last_used, added_at
FROM corrections
WHERE is_active = 1
ORDER BY domain, from_text;

pending_suggestions

Suggestions pending review with example count.

SELECT s.id, s.from_text, s.to_text, s.domain, s.frequency, s.confidence,
       s.first_seen, s.last_seen, COUNT(e.id) as example_count
FROM learned_suggestions s
LEFT JOIN suggestion_examples e ON s.id = e.suggestion_id
WHERE s.status = 'pending'
GROUP BY s.id
ORDER BY s.confidence DESC, s.frequency DESC;

correction_statistics

Statistics per domain.

SELECT domain,
       COUNT(*) as total_corrections,
       COUNT(CASE WHEN source = 'manual' THEN 1 END) as manual_count,
       COUNT(CASE WHEN source = 'learned' THEN 1 END) as learned_count,
       COUNT(CASE WHEN source = 'imported' THEN 1 END) as imported_count,
       SUM(usage_count) as total_usage,
       MAX(added_at) as last_updated
FROM corrections
WHERE is_active = 1
GROUP BY domain;

Querying the Database

Using Python API

from pathlib import Path
from core import CorrectionRepository, CorrectionService

# Initialize
db_path = Path.home() / ".transcript-fixer" / "corrections.db"
repository = CorrectionRepository(db_path)
service = CorrectionService(repository)

# Add correction
service.add_correction("错误", "正确", domain="general")

# Get corrections
corrections = service.get_corrections(domain="general")

# Get statistics
stats = service.get_statistics(domain="general")
print(f"Total: {stats['total_corrections']}")

# Close
service.close()

Using SQLite CLI

# Open database
sqlite3 ~/.transcript-fixer/corrections.db

# View active corrections
SELECT from_text, to_text, domain FROM active_corrections;

# View statistics
SELECT * FROM correction_statistics;

# View pending suggestions
SELECT * FROM pending_suggestions;

# Check schema version
SELECT value FROM system_config WHERE key = 'schema_version';

Import/Export

Export to JSON

service = _get_service()
corrections = service.export_corrections(domain="general")

# Write to file
import json
with open("export.json", "w", encoding="utf-8") as f:
    json.dump({
        "version": "2.0",
        "domain": "general",
        "corrections": corrections
    }, f, ensure_ascii=False, indent=2)

Import from JSON

import json

with open("import.json", "r", encoding="utf-8") as f:
    data = json.load(f)

service = _get_service()
inserted, updated, skipped = service.import_corrections(
    corrections=data["corrections"],
    domain=data.get("domain", "general"),
    merge=True,
    validate_all=True
)

print(f"Imported: {inserted} new, {updated} updated, {skipped} skipped")

Backup Strategy

Automatic Backups

The system maintains database integrity through SQLite's ACID guarantees and automatic journaling.

Manual Backups

# Backup database
cp ~/.transcript-fixer/corrections.db ~/backups/corrections_$(date +%Y%m%d).db

# Or use SQLite backup
sqlite3 ~/.transcript-fixer/corrections.db ".backup ~/backups/corrections.db"

Version Control

Recommended: Use Git for configuration and export files, but NOT for the database:

# .gitignore
*.db
*.db-journal
*.bak

Instead, export corrections periodically:

python scripts/fix_transcription.py --export-json corrections_backup.json
git add corrections_backup.json
git commit -m "Backup corrections"

Best Practices

  1. Regular Exports: Export to JSON weekly for team sharing
  2. Database Backups: Backup .db file before major changes
  3. Use Transactions: All modifications use ACID transactions automatically
  4. Soft Deletes: Records are marked inactive, not deleted (preserves audit trail)
  5. Validate: Run --validate after manual database changes
  6. Statistics: Check usage patterns via correction_statistics view
  7. Cleanup: Old history can be archived (query by run_timestamp)

Troubleshooting

Database Locked

# Check for lingering connections
lsof ~/.transcript-fixer/corrections.db

# If needed, backup and recreate
cp corrections.db corrections_backup.db
sqlite3 corrections.db "VACUUM;"

Corrupted Database

# Check integrity
sqlite3 corrections.db "PRAGMA integrity_check;"

# Recover if possible
sqlite3 corrections.db ".recover" | sqlite3 corrections_new.db

Missing Tables

# Reinitialize schema (safe, uses IF NOT EXISTS)
python -c "from core import CorrectionRepository; from pathlib import Path; CorrectionRepository(Path.home() / '.transcript-fixer' / 'corrections.db')"