Release v1.8.0: Add transcript-fixer skill

## New Skill: transcript-fixer v1.0.0 Correct speech-to-text (ASR/STT) transcription errors through dictionary-based rules and AI-powered corrections with automatic pattern learning. **Features:** - Two-stage correction pipeline (dictionary + AI) - Automatic pattern detection and learning - Domain-specific dictionaries (general, embodied_ai, finance, medical) - SQLite-based correction repository - Team collaboration with import/export - GLM API integration for AI corrections - Cost optimization through dictionary promotion **Use cases:** - Correcting meeting notes, lecture recordings, or interview transcripts - Fixing Chinese/English homophone errors and technical terminology - Building domain-specific correction dictionaries - Improving transcript accuracy through iterative learning **Documentation:** - Complete workflow guides in references/ - SQL query templates - Troubleshooting guide - Team collaboration patterns - API setup instructions **Marketplace updates:** - Updated marketplace to v1.8.0 - Added transcript-fixer plugin (category: productivity) - Updated README.md with skill description and use cases - Updated CLAUDE.md with skill listing and counts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-28 13:16:37 +08:00
parent d1041ac203
commit bd0aa12004
44 changed files with 7432 additions and 8 deletions
--- a/transcript-fixer/scripts/utils/diff_formats/text_splitter.py
+++ b/transcript-fixer/scripts/utils/diff_formats/text_splitter.py
@@ -0,0 +1,33 @@
+#!/usr/bin/env python3
+"""
+Text splitter utility for word-level diff generation
+
+SINGLE RESPONSIBILITY: Split text into words while preserving structure
+"""
+
+from __future__ import annotations
+
+import re
+
+
+def split_into_words(text: str) -> list[str]:
+    """
+    Split text into words, preserving whitespace and punctuation
+
+    This enables word-level diff generation for Chinese and English text
+
+    Args:
+        text: Input text to split
+
+    Returns:
+        List of word tokens (Chinese words, English words, numbers, punctuation)
+    """
+    # Pattern: Chinese chars, English words, numbers, non-alphanumeric chars
+    pattern = r'[\u4e00-\u9fff]+|[a-zA-Z]+|[0-9]+|[^\u4e00-\u9fffa-zA-Z0-9]'
+    return re.findall(pattern, text)
+
+
+def read_file(file_path: str) -> str:
+    """Read file contents"""
+    with open(file_path, 'r', encoding='utf-8') as f:
+        return f.read()