# Database Schema Reference **MUST read this before any database operations.** Database location: `~/.transcript-fixer/corrections.db` ## Core Tables ### corrections Main storage for correction mappings. | Column | Type | Description | |--------|------|-------------| | id | INTEGER | Primary key | | from_text | TEXT | Error text to match (NOT NULL) | | to_text | TEXT | Correct replacement (NOT NULL) | | domain | TEXT | Domain: general, embodied_ai, finance, medical | | source | TEXT | 'manual', 'learned', 'imported' | | confidence | REAL | 0.0-1.0 | | added_by | TEXT | Username | | added_at | TIMESTAMP | Creation time | | usage_count | INTEGER | Times this correction was applied | | last_used | TIMESTAMP | Last time used | | notes | TEXT | Optional notes | | is_active | BOOLEAN | Active flag (1=active, 0=disabled) | **Constraint**: `UNIQUE(from_text, domain)` ### context_rules Regex-based context-aware correction rules. | Column | Type | Description | |--------|------|-------------| | id | INTEGER | Primary key | | pattern | TEXT | Regex pattern (UNIQUE) | | replacement | TEXT | Replacement text | | description | TEXT | Rule description | | priority | INTEGER | Higher = processed first | | is_active | BOOLEAN | Active flag | ### learned_suggestions AI-learned patterns pending user review. | Column | Type | Description | |--------|------|-------------| | id | INTEGER | Primary key | | from_text | TEXT | Detected error | | to_text | TEXT | Suggested correction | | domain | TEXT | Domain | | frequency | INTEGER | Occurrence count (≥1) | | confidence | REAL | AI confidence (0.0-1.0) | | first_seen | TIMESTAMP | First occurrence | | last_seen | TIMESTAMP | Last occurrence | | status | TEXT | 'pending', 'approved', 'rejected' | | reviewed_at | TIMESTAMP | Review time | | reviewed_by | TEXT | Reviewer | **Constraint**: `UNIQUE(from_text, to_text, domain)` ### correction_history Audit log for all correction runs. | Column | Type | Description | |--------|------|-------------| | id | INTEGER | Primary key | | filename | TEXT | Input file name | | domain | TEXT | Domain used | | run_timestamp | TIMESTAMP | When run | | original_length | INTEGER | Original text length | | stage1_changes | INTEGER | Dictionary changes count | | stage2_changes | INTEGER | AI changes count | | model | TEXT | AI model used | | execution_time_ms | INTEGER | Processing time | | success | BOOLEAN | Success flag | | error_message | TEXT | Error if failed | ### correction_changes Detailed changes made in each correction run. | Column | Type | Description | |--------|------|-------------| | id | INTEGER | Primary key | | history_id | INTEGER | FK → correction_history.id | | line_number | INTEGER | Line where change occurred | | from_text | TEXT | Original text | | to_text | TEXT | Corrected text | | rule_type | TEXT | 'context', 'dictionary', 'ai' | | rule_id | INTEGER | Reference to rule used | | context_before | TEXT | Text before change | | context_after | TEXT | Text after change | ### system_config Key-value configuration store. | Column | Type | Description | |--------|------|-------------| | key | TEXT | Config key (PRIMARY KEY) | | value | TEXT | Config value | | value_type | TEXT | 'string', 'int', 'float', 'boolean', 'json' | | description | TEXT | What this config does | | updated_at | TIMESTAMP | Last update | **Default configs**: - `schema_version`: '2.0' - `api_model`: 'GLM-4.6' - `learning_frequency_threshold`: 3 - `learning_confidence_threshold`: 0.8 - `history_retention_days`: 90 ### audit_log Comprehensive operations trail. | Column | Type | Description | |--------|------|-------------| | id | INTEGER | Primary key | | timestamp | TIMESTAMP | When occurred | | action | TEXT | Action type | | entity_type | TEXT | Table affected | | entity_id | INTEGER | Row ID | | user | TEXT | Who did it | | details | TEXT | JSON details | | success | BOOLEAN | Success flag | | error_message | TEXT | Error if failed | ## Views ### active_corrections Active corrections only, ordered by domain and from_text. ```sql SELECT * FROM active_corrections; ``` ### pending_suggestions Suggestions awaiting review, with example count. ```sql SELECT * FROM pending_suggestions WHERE confidence > 0.8; ``` ### correction_statistics Statistics per domain. ```sql SELECT * FROM correction_statistics; ``` ## Common Queries ```sql -- List all active corrections SELECT from_text, to_text, domain FROM active_corrections; -- Check pending high-confidence suggestions SELECT * FROM pending_suggestions WHERE confidence > 0.8 ORDER BY frequency DESC; -- Domain statistics SELECT domain, total_corrections, total_usage FROM correction_statistics; -- Recent correction history SELECT filename, stage1_changes, stage2_changes, run_timestamp FROM correction_history ORDER BY run_timestamp DESC LIMIT 10; -- Add new correction (use CLI instead for safety) INSERT INTO corrections (from_text, to_text, domain, source, confidence, added_by) VALUES ('错误词', '正确词', 'general', 'manual', 1.0, 'user'); -- Disable a correction UPDATE corrections SET is_active = 0 WHERE id = ?; ``` ## Schema Version Check current version: ```sql SELECT value FROM system_config WHERE key = 'schema_version'; ``` For complete schema including indexes and constraints, see `scripts/core/schema.sql`.