firefrost-gaming/claude-code-skills-reference

Files

daymade bd0aa12004 Release v1.8.0: Add transcript-fixer skill

## New Skill: transcript-fixer v1.0.0

Correct speech-to-text (ASR/STT) transcription errors through dictionary-based rules and AI-powered corrections with automatic pattern learning.

**Features:**
- Two-stage correction pipeline (dictionary + AI)
- Automatic pattern detection and learning
- Domain-specific dictionaries (general, embodied_ai, finance, medical)
- SQLite-based correction repository
- Team collaboration with import/export
- GLM API integration for AI corrections
- Cost optimization through dictionary promotion

**Use cases:**
- Correcting meeting notes, lecture recordings, or interview transcripts
- Fixing Chinese/English homophone errors and technical terminology
- Building domain-specific correction dictionaries
- Improving transcript accuracy through iterative learning

**Documentation:**
- Complete workflow guides in references/
- SQL query templates
- Troubleshooting guide
- Team collaboration patterns
- API setup instructions

**Marketplace updates:**
- Updated marketplace to v1.8.0
- Added transcript-fixer plugin (category: productivity)
- Updated README.md with skill description and use cases
- Updated CLAUDE.md with skill listing and counts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-10-28 13:16:37 +08:00

11 KiB

Raw Blame History

Storage Format Reference

This document describes the SQLite database format used by transcript-fixer v2.0.

Database Location
Database Schema
- Core Tables
- Views
Querying the Database
- Using Python API
- Using SQLite CLI
Import/Export
- Export to JSON
- Import from JSON
Backup Strategy
Best Practices
Troubleshooting

Database Location

Path: ~/.transcript-fixer/corrections.db

Type: SQLite 3 database with ACID guarantees

Database Schema

Core Tables

corrections

Main correction dictionary storage.

Column	Type	Constraints	Description
id	INTEGER	PRIMARY KEY	Auto-increment ID
from_text	TEXT	NOT NULL	Original (incorrect) text
to_text	TEXT	NOT NULL	Corrected text
domain	TEXT	DEFAULT 'general'	Correction domain
source	TEXT	CHECK IN ('manual', 'learned', 'imported')	Origin of correction
confidence	REAL	CHECK 0.0-1.0	Confidence score
added_by	TEXT		User who added
added_at	TIMESTAMP	DEFAULT CURRENT_TIMESTAMP	When added
usage_count	INTEGER	DEFAULT 0, CHECK >= 0	Times used
last_used	TIMESTAMP		Last usage time
notes	TEXT		Optional notes
is_active	BOOLEAN	DEFAULT 1	Soft delete flag

Unique Constraint: (from_text, domain)

Indexes: domain, source, added_at, is_active, from_text

context_rules

Regex-based context-aware correction rules.

Column	Type	Constraints	Description
id	INTEGER	PRIMARY KEY	Auto-increment ID
pattern	TEXT	NOT NULL, UNIQUE	Regex pattern
replacement	TEXT	NOT NULL	Replacement text
description	TEXT		Rule explanation
priority	INTEGER	DEFAULT 0	Higher = applied first
is_active	BOOLEAN	DEFAULT 1	Enable/disable
added_at	TIMESTAMP	DEFAULT CURRENT_TIMESTAMP	When added
added_by	TEXT		User who added

Indexes: priority (DESC), is_active

correction_history

Audit log for all correction runs.

Column	Type	Constraints	Description
id	INTEGER	PRIMARY KEY	Auto-increment ID
filename	TEXT	NOT NULL	File corrected
domain	TEXT	NOT NULL	Domain used
run_timestamp	TIMESTAMP	DEFAULT CURRENT_TIMESTAMP	When run
original_length	INTEGER	CHECK >= 0	Original file size
stage1_changes	INTEGER	CHECK >= 0	Dictionary changes
stage2_changes	INTEGER	CHECK >= 0	AI changes
model	TEXT		AI model used
execution_time_ms	INTEGER		Runtime in ms
success	BOOLEAN	DEFAULT 1	Success flag
error_message	TEXT		Error if failed

Indexes: run_timestamp (DESC), domain, success

correction_changes

Detailed changes made in each run.

Column	Type	Constraints	Description
id	INTEGER	PRIMARY KEY	Auto-increment ID
history_id	INTEGER	FOREIGN KEY → correction_history	Parent run
line_number	INTEGER		Line in file
from_text	TEXT	NOT NULL	Original text
to_text	TEXT	NOT NULL	Corrected text
rule_type	TEXT	CHECK IN ('context', 'dictionary', 'ai')	Rule type
rule_id	INTEGER		Reference to rule
context_before	TEXT		Text before
context_after	TEXT		Text after

Foreign Key: history_id → correction_history.id (CASCADE DELETE)

Indexes: history_id, rule_type

learned_suggestions

AI-detected patterns pending review.

Column	Type	Constraints	Description
id	INTEGER	PRIMARY KEY	Auto-increment ID
from_text	TEXT	NOT NULL	Pattern detected
to_text	TEXT	NOT NULL	Suggested correction
domain	TEXT	DEFAULT 'general'	Domain
frequency	INTEGER	CHECK > 0	Times seen
confidence	REAL	CHECK 0.0-1.0	Confidence score
first_seen	TIMESTAMP	DEFAULT CURRENT_TIMESTAMP	First occurrence
last_seen	TIMESTAMP	DEFAULT CURRENT_TIMESTAMP	Last occurrence
status	TEXT	CHECK IN ('pending', 'approved', 'rejected')	Review status
reviewed_at	TIMESTAMP		When reviewed
reviewed_by	TEXT		Who reviewed

Unique Constraint: (from_text, to_text, domain)

Indexes: status, domain, confidence (DESC), frequency (DESC)

suggestion_examples

Example occurrences of learned patterns.

Column	Type	Constraints	Description
id	INTEGER	PRIMARY KEY	Auto-increment ID
suggestion_id	INTEGER	FOREIGN KEY → learned_suggestions	Parent suggestion
filename	TEXT	NOT NULL	File where found
line_number	INTEGER		Line number
context	TEXT	NOT NULL	Surrounding text
occurred_at	TIMESTAMP	DEFAULT CURRENT_TIMESTAMP	When found

Foreign Key: suggestion_id → learned_suggestions.id (CASCADE DELETE)

Index: suggestion_id

system_config

System configuration key-value store.

Column	Type	Constraints	Description
key	TEXT	PRIMARY KEY	Config key
value	TEXT	NOT NULL	Config value
value_type	TEXT	CHECK IN ('string', 'int', 'float', 'boolean', 'json')	Value type
description	TEXT		Config description
updated_at	TIMESTAMP	DEFAULT CURRENT_TIMESTAMP	Last update

Default Values:

schema_version: "2.0"
api_provider: "GLM"
api_model: "GLM-4.6"
default_domain: "general"
auto_learn_enabled: "true"
learning_frequency_threshold: "3"
learning_confidence_threshold: "0.8"

audit_log

Comprehensive audit trail for all operations.

Column	Type	Constraints	Description
id	INTEGER	PRIMARY KEY	Auto-increment ID
timestamp	TIMESTAMP	DEFAULT CURRENT_TIMESTAMP	When occurred
action	TEXT	NOT NULL	Action type
entity_type	TEXT	NOT NULL	Entity affected
entity_id	INTEGER		Entity ID
user	TEXT		User who performed
details	TEXT		Action details
success	BOOLEAN	DEFAULT 1	Success flag
error_message	TEXT		Error if failed

Indexes: timestamp (DESC), action, entity_type, success

Views

active_corrections

Quick access to active corrections.

SELECT id, from_text, to_text, domain, source, confidence, usage_count, last_used, added_at
FROM corrections
WHERE is_active = 1
ORDER BY domain, from_text;

pending_suggestions

Suggestions pending review with example count.

SELECT s.id, s.from_text, s.to_text, s.domain, s.frequency, s.confidence,
       s.first_seen, s.last_seen, COUNT(e.id) as example_count
FROM learned_suggestions s
LEFT JOIN suggestion_examples e ON s.id = e.suggestion_id
WHERE s.status = 'pending'
GROUP BY s.id
ORDER BY s.confidence DESC, s.frequency DESC;

correction_statistics

Statistics per domain.

SELECT domain,
       COUNT(*) as total_corrections,
       COUNT(CASE WHEN source = 'manual' THEN 1 END) as manual_count,
       COUNT(CASE WHEN source = 'learned' THEN 1 END) as learned_count,
       COUNT(CASE WHEN source = 'imported' THEN 1 END) as imported_count,
       SUM(usage_count) as total_usage,
       MAX(added_at) as last_updated
FROM corrections
WHERE is_active = 1
GROUP BY domain;

Querying the Database

Using Python API

from pathlib import Path
from core import CorrectionRepository, CorrectionService

# Initialize
db_path = Path.home() / ".transcript-fixer" / "corrections.db"
repository = CorrectionRepository(db_path)
service = CorrectionService(repository)

# Add correction
service.add_correction("错误", "正确", domain="general")

# Get corrections
corrections = service.get_corrections(domain="general")

# Get statistics
stats = service.get_statistics(domain="general")
print(f"Total: {stats['total_corrections']}")

# Close
service.close()

Using SQLite CLI

# Open database
sqlite3 ~/.transcript-fixer/corrections.db

# View active corrections
SELECT from_text, to_text, domain FROM active_corrections;

# View statistics
SELECT * FROM correction_statistics;

# View pending suggestions
SELECT * FROM pending_suggestions;

# Check schema version
SELECT value FROM system_config WHERE key = 'schema_version';

Import/Export

Export to JSON

service = _get_service()
corrections = service.export_corrections(domain="general")

# Write to file
import json
with open("export.json", "w", encoding="utf-8") as f:
    json.dump({
        "version": "2.0",
        "domain": "general",
        "corrections": corrections
    }, f, ensure_ascii=False, indent=2)

Import from JSON

import json

with open("import.json", "r", encoding="utf-8") as f:
    data = json.load(f)

service = _get_service()
inserted, updated, skipped = service.import_corrections(
    corrections=data["corrections"],
    domain=data.get("domain", "general"),
    merge=True,
    validate_all=True
)

print(f"Imported: {inserted} new, {updated} updated, {skipped} skipped")

Backup Strategy

Automatic Backups

The system maintains database integrity through SQLite's ACID guarantees and automatic journaling.

Manual Backups

# Backup database
cp ~/.transcript-fixer/corrections.db ~/backups/corrections_$(date +%Y%m%d).db

# Or use SQLite backup
sqlite3 ~/.transcript-fixer/corrections.db ".backup ~/backups/corrections.db"

Version Control

Recommended: Use Git for configuration and export files, but NOT for the database:

# .gitignore
*.db
*.db-journal
*.bak

Instead, export corrections periodically:

python scripts/fix_transcription.py --export-json corrections_backup.json
git add corrections_backup.json
git commit -m "Backup corrections"

Best Practices

Regular Exports: Export to JSON weekly for team sharing
Database Backups: Backup .db file before major changes
Use Transactions: All modifications use ACID transactions automatically
Soft Deletes: Records are marked inactive, not deleted (preserves audit trail)
Validate: Run --validate after manual database changes
Statistics: Check usage patterns via correction_statistics view
Cleanup: Old history can be archived (query by run_timestamp)

Troubleshooting

Database Locked

# Check for lingering connections
lsof ~/.transcript-fixer/corrections.db

# If needed, backup and recreate
cp corrections.db corrections_backup.db
sqlite3 corrections.db "VACUUM;"

Corrupted Database

# Check integrity
sqlite3 corrections.db "PRAGMA integrity_check;"

# Recover if possible
sqlite3 corrections.db ".recover" | sqlite3 corrections_new.db

Missing Tables

# Reinitialize schema (safe, uses IF NOT EXISTS)
python -c "from core import CorrectionRepository; from pathlib import Path; CorrectionRepository(Path.home() / '.transcript-fixer' / 'corrections.db')"

11 KiB Raw Blame History