Release v1.8.0: Add transcript-fixer skill
## New Skill: transcript-fixer v1.0.0 Correct speech-to-text (ASR/STT) transcription errors through dictionary-based rules and AI-powered corrections with automatic pattern learning. **Features:** - Two-stage correction pipeline (dictionary + AI) - Automatic pattern detection and learning - Domain-specific dictionaries (general, embodied_ai, finance, medical) - SQLite-based correction repository - Team collaboration with import/export - GLM API integration for AI corrections - Cost optimization through dictionary promotion **Use cases:** - Correcting meeting notes, lecture recordings, or interview transcripts - Fixing Chinese/English homophone errors and technical terminology - Building domain-specific correction dictionaries - Improving transcript accuracy through iterative learning **Documentation:** - Complete workflow guides in references/ - SQL query templates - Troubleshooting guide - Team collaboration patterns - API setup instructions **Marketplace updates:** - Updated marketplace to v1.8.0 - Added transcript-fixer plugin (category: productivity) - Updated README.md with skill description and use cases - Updated CLAUDE.md with skill listing and counts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
33
transcript-fixer/scripts/utils/diff_formats/text_splitter.py
Normal file
33
transcript-fixer/scripts/utils/diff_formats/text_splitter.py
Normal file
@@ -0,0 +1,33 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Text splitter utility for word-level diff generation
|
||||
|
||||
SINGLE RESPONSIBILITY: Split text into words while preserving structure
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
|
||||
|
||||
def split_into_words(text: str) -> list[str]:
|
||||
"""
|
||||
Split text into words, preserving whitespace and punctuation
|
||||
|
||||
This enables word-level diff generation for Chinese and English text
|
||||
|
||||
Args:
|
||||
text: Input text to split
|
||||
|
||||
Returns:
|
||||
List of word tokens (Chinese words, English words, numbers, punctuation)
|
||||
"""
|
||||
# Pattern: Chinese chars, English words, numbers, non-alphanumeric chars
|
||||
pattern = r'[\u4e00-\u9fff]+|[a-zA-Z]+|[0-9]+|[^\u4e00-\u9fffa-zA-Z0-9]'
|
||||
return re.findall(pattern, text)
|
||||
|
||||
|
||||
def read_file(file_path: str) -> str:
|
||||
"""Read file contents"""
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
return f.read()
|
||||
Reference in New Issue
Block a user