- Add date_added to all 950+ skills for complete tracking - Update version to 6.5.0 in package.json and README - Regenerate all indexes and catalog - Sync all generated files Features from merged PR #150: - Stars/Upvotes system for community-driven discovery - Auto-update mechanism via START_APP.bat - Interactive Prompt Builder - Date tracking badges - Smart auto-categorization All skills validated and indexed. Made-with: Cursor
556 lines
15 KiB
Markdown
556 lines
15 KiB
Markdown
---
|
||
name: audio-transcriber
|
||
description: "Transform audio recordings into professional Markdown documentation with intelligent summaries using LLM integration"
|
||
category: content
|
||
risk: safe
|
||
source: community
|
||
tags: "[audio, transcription, whisper, meeting-minutes, speech-to-text]"
|
||
date_added: "2026-02-27"
|
||
---
|
||
|
||
## Purpose
|
||
|
||
This skill automates audio-to-text transcription with professional Markdown output, extracting rich technical metadata (speakers, timestamps, language, file size, duration) and generating structured meeting minutes and executive summaries. It uses Faster-Whisper or Whisper with zero configuration, working universally across projects without hardcoded paths or API keys.
|
||
|
||
Inspired by tools like Plaud, this skill transforms raw audio recordings into actionable documentation, making it ideal for meetings, interviews, lectures, and content analysis.
|
||
|
||
## When to Use
|
||
|
||
Invoke this skill when:
|
||
|
||
- User needs to transcribe audio/video files to text
|
||
- User wants meeting minutes automatically generated from recordings
|
||
- User requires speaker identification (diarization) in conversations
|
||
- User needs subtitles/captions (SRT, VTT formats)
|
||
- User wants executive summaries of long audio content
|
||
- User asks variations of "transcribe this audio", "convert audio to text", "generate meeting notes from recording"
|
||
- User has audio files in common formats (MP3, WAV, M4A, OGG, FLAC, WEBM)
|
||
|
||
## Workflow
|
||
|
||
### Step 0: Discovery (Auto-detect Transcription Tools)
|
||
|
||
**Objective:** Identify available transcription engines without user configuration.
|
||
|
||
**Actions:**
|
||
|
||
Run detection commands to find installed tools:
|
||
|
||
```bash
|
||
# Check for Faster-Whisper (preferred - 4-5x faster)
|
||
if python3 -c "import faster_whisper" 2>/dev/null; then
|
||
TRANSCRIBER="faster-whisper"
|
||
echo "✅ Faster-Whisper detected (optimized)"
|
||
# Fallback to original Whisper
|
||
elif python3 -c "import whisper" 2>/dev/null; then
|
||
TRANSCRIBER="whisper"
|
||
echo "✅ OpenAI Whisper detected"
|
||
else
|
||
TRANSCRIBER="none"
|
||
echo "⚠️ No transcription tool found"
|
||
fi
|
||
|
||
# Check for ffmpeg (audio format conversion)
|
||
if command -v ffmpeg &>/dev/null; then
|
||
echo "✅ ffmpeg available (format conversion enabled)"
|
||
else
|
||
echo "ℹ️ ffmpeg not found (limited format support)"
|
||
fi
|
||
```
|
||
|
||
**If no transcriber found:**
|
||
|
||
Offer automatic installation using the provided script:
|
||
|
||
```bash
|
||
echo "⚠️ No transcription tool found"
|
||
echo ""
|
||
echo "🔧 Auto-install dependencies? (Recommended)"
|
||
read -p "Run installation script? [Y/n]: " AUTO_INSTALL
|
||
|
||
if [[ ! "$AUTO_INSTALL" =~ ^[Nn] ]]; then
|
||
# Get skill directory (works for both repo and symlinked installations)
|
||
SKILL_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||
|
||
# Run installation script
|
||
if [[ -f "$SKILL_DIR/scripts/install-requirements.sh" ]]; then
|
||
bash "$SKILL_DIR/scripts/install-requirements.sh"
|
||
else
|
||
echo "❌ Installation script not found"
|
||
echo ""
|
||
echo "📦 Manual installation:"
|
||
echo " pip install faster-whisper # Recommended"
|
||
echo " pip install openai-whisper # Alternative"
|
||
echo " brew install ffmpeg # Optional (macOS)"
|
||
exit 1
|
||
fi
|
||
|
||
# Verify installation succeeded
|
||
if python3 -c "import faster_whisper" 2>/dev/null || python3 -c "import whisper" 2>/dev/null; then
|
||
echo "✅ Installation successful! Proceeding with transcription..."
|
||
else
|
||
echo "❌ Installation failed. Please install manually."
|
||
exit 1
|
||
fi
|
||
else
|
||
echo ""
|
||
echo "📦 Manual installation required:"
|
||
echo ""
|
||
echo "Recommended (fastest):"
|
||
echo " pip install faster-whisper"
|
||
echo ""
|
||
echo "Alternative (original):"
|
||
echo " pip install openai-whisper"
|
||
echo ""
|
||
echo "Optional (format conversion):"
|
||
echo " brew install ffmpeg # macOS"
|
||
echo " apt install ffmpeg # Linux"
|
||
echo ""
|
||
exit 1
|
||
fi
|
||
```
|
||
|
||
This ensures users can install dependencies with one confirmation, or opt for manual installation if preferred.
|
||
|
||
**If transcriber found:**
|
||
|
||
Proceed to Step 0b (CLI Detection).
|
||
|
||
|
||
### Step 1: Validate Audio File
|
||
|
||
**Objective:** Verify file exists, check format, and extract metadata.
|
||
|
||
**Actions:**
|
||
|
||
1. **Accept file path or URL** from user:
|
||
- Local file: `meeting.mp3`
|
||
- URL: `https://example.com/audio.mp3` (download to temp directory)
|
||
|
||
2. **Verify file exists:**
|
||
|
||
```bash
|
||
if [[ ! -f "$AUDIO_FILE" ]]; then
|
||
echo "❌ File not found: $AUDIO_FILE"
|
||
exit 1
|
||
fi
|
||
```
|
||
|
||
3. **Extract metadata** using ffprobe or file utilities:
|
||
|
||
```bash
|
||
# Get file size
|
||
FILE_SIZE=$(du -h "$AUDIO_FILE" | cut -f1)
|
||
|
||
# Get duration and format using ffprobe
|
||
DURATION=$(ffprobe -v error -show_entries format=duration \
|
||
-of default=noprint_wrappers=1:nokey=1 "$AUDIO_FILE" 2>/dev/null)
|
||
FORMAT=$(ffprobe -v error -select_streams a:0 -show_entries \
|
||
stream=codec_name -of default=noprint_wrappers=1:nokey=1 "$AUDIO_FILE" 2>/dev/null)
|
||
|
||
# Convert duration to HH:MM:SS
|
||
DURATION_HMS=$(date -u -r "$DURATION" +%H:%M:%S 2>/dev/null || echo "Unknown")
|
||
```
|
||
|
||
4. **Check file size** (warn if large for cloud APIs):
|
||
|
||
```bash
|
||
SIZE_MB=$(du -m "$AUDIO_FILE" | cut -f1)
|
||
if [[ $SIZE_MB -gt 25 ]]; then
|
||
echo "⚠️ Large file ($FILE_SIZE) - processing may take several minutes"
|
||
fi
|
||
```
|
||
|
||
5. **Validate format** (supported: MP3, WAV, M4A, OGG, FLAC, WEBM):
|
||
|
||
```bash
|
||
EXTENSION="${AUDIO_FILE##*.}"
|
||
SUPPORTED_FORMATS=("mp3" "wav" "m4a" "ogg" "flac" "webm" "mp4")
|
||
|
||
if [[ ! " ${SUPPORTED_FORMATS[@]} " =~ " ${EXTENSION,,} " ]]; then
|
||
echo "⚠️ Unsupported format: $EXTENSION"
|
||
if command -v ffmpeg &>/dev/null; then
|
||
echo "🔄 Converting to WAV..."
|
||
ffmpeg -i "$AUDIO_FILE" -ar 16000 "${AUDIO_FILE%.*}.wav" -y
|
||
AUDIO_FILE="${AUDIO_FILE%.*}.wav"
|
||
else
|
||
echo "❌ Install ffmpeg to convert formats: brew install ffmpeg"
|
||
exit 1
|
||
fi
|
||
fi
|
||
```
|
||
|
||
|
||
### Step 3: Generate Markdown Output
|
||
|
||
**Objective:** Create structured Markdown with metadata, transcription, meeting minutes, and summary.
|
||
|
||
**Output Template:**
|
||
|
||
```markdown
|
||
# Audio Transcription Report
|
||
|
||
## 📊 Metadata
|
||
|
||
| Field | Value |
|
||
|-------|-------|
|
||
| **File Name** | {filename} |
|
||
| **File Size** | {file_size} |
|
||
| **Duration** | {duration_hms} |
|
||
| **Language** | {language} ({language_code}) |
|
||
| **Processed Date** | {process_date} |
|
||
| **Speakers Identified** | {num_speakers} |
|
||
| **Transcription Engine** | {engine} (model: {model}) |
|
||
|
||
|
||
## 📋 Meeting Minutes
|
||
|
||
### Participants
|
||
- {speaker_1}
|
||
- {speaker_2}
|
||
- ...
|
||
|
||
### Topics Discussed
|
||
1. **{topic_1}** ({timestamp})
|
||
- {key_point_1}
|
||
- {key_point_2}
|
||
|
||
2. **{topic_2}** ({timestamp})
|
||
- {key_point_1}
|
||
|
||
### Decisions Made
|
||
- ✅ {decision_1}
|
||
- ✅ {decision_2}
|
||
|
||
### Action Items
|
||
- [ ] **{action_1}** - Assigned to: {speaker} - Due: {date_if_mentioned}
|
||
- [ ] **{action_2}** - Assigned to: {speaker}
|
||
|
||
|
||
*Generated by audio-transcriber skill v1.0.0*
|
||
*Transcription engine: {engine} | Processing time: {elapsed_time}s*
|
||
```
|
||
|
||
**Implementation:**
|
||
|
||
Use Python or bash with AI model (Claude/GPT) for intelligent summarization:
|
||
|
||
```python
|
||
def generate_meeting_minutes(segments):
|
||
"""Extract topics, decisions, action items from transcription."""
|
||
|
||
# Group segments by topic (simple clustering by timestamps)
|
||
topics = cluster_by_topic(segments)
|
||
|
||
# Identify action items (keywords: "should", "will", "need to", "action")
|
||
action_items = extract_action_items(segments)
|
||
|
||
# Identify decisions (keywords: "decided", "agreed", "approved")
|
||
decisions = extract_decisions(segments)
|
||
|
||
return {
|
||
"topics": topics,
|
||
"decisions": decisions,
|
||
"action_items": action_items
|
||
}
|
||
|
||
def generate_summary(segments, max_paragraphs=5):
|
||
"""Create executive summary using AI (Claude/GPT via API or local model)."""
|
||
|
||
full_text = " ".join([s["text"] for s in segments])
|
||
|
||
# Use Chain of Density approach (from prompt-engineer frameworks)
|
||
summary_prompt = f"""
|
||
Summarize the following transcription in {max_paragraphs} concise paragraphs.
|
||
Focus on key topics, decisions, and action items.
|
||
|
||
Transcription:
|
||
{full_text}
|
||
"""
|
||
|
||
# Call AI model (placeholder - user can integrate Claude API or use local model)
|
||
summary = call_ai_model(summary_prompt)
|
||
|
||
return summary
|
||
```
|
||
|
||
**Output file naming:**
|
||
|
||
```bash
|
||
# v1.1.0: Use timestamp para evitar sobrescrever
|
||
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
|
||
TRANSCRIPT_FILE="transcript-${TIMESTAMP}.md"
|
||
ATA_FILE="ata-${TIMESTAMP}.md"
|
||
|
||
echo "$TRANSCRIPT_CONTENT" > "$TRANSCRIPT_FILE"
|
||
echo "✅ Transcript salvo: $TRANSCRIPT_FILE"
|
||
|
||
if [[ -n "$ATA_CONTENT" ]]; then
|
||
echo "$ATA_CONTENT" > "$ATA_FILE"
|
||
echo "✅ Ata salva: $ATA_FILE"
|
||
fi
|
||
```
|
||
|
||
|
||
#### **SCENARIO A: User Provided Custom Prompt**
|
||
|
||
**Workflow:**
|
||
|
||
1. **Display user's prompt:**
|
||
```
|
||
📝 Prompt fornecido pelo usuário:
|
||
┌──────────────────────────────────┐
|
||
│ [User's prompt preview] │
|
||
└──────────────────────────────────┘
|
||
```
|
||
|
||
2. **Automatically improve with prompt-engineer (if available):**
|
||
```bash
|
||
🔧 Melhorando prompt com prompt-engineer...
|
||
[Invokes: gh copilot -p "melhore este prompt: {user_prompt}"]
|
||
```
|
||
|
||
3. **Show both versions:**
|
||
```
|
||
✨ Versão melhorada:
|
||
┌──────────────────────────────────┐
|
||
│ Role: Você é um documentador... │
|
||
│ Instructions: Transforme... │
|
||
│ Steps: 1) ... 2) ... │
|
||
│ End Goal: ... │
|
||
└──────────────────────────────────┘
|
||
|
||
📝 Versão original:
|
||
┌──────────────────────────────────┐
|
||
│ [User's original prompt] │
|
||
└──────────────────────────────────┘
|
||
```
|
||
|
||
4. **Ask which to use:**
|
||
```bash
|
||
💡 Usar versão melhorada? [s/n] (default: s):
|
||
```
|
||
|
||
5. **Process with selected prompt:**
|
||
- If "s": use improved
|
||
- If "n": use original
|
||
|
||
|
||
#### **LLM Processing (Both Scenarios)**
|
||
|
||
Once prompt is finalized:
|
||
|
||
```python
|
||
from rich.progress import Progress, SpinnerColumn, TextColumn
|
||
|
||
def process_with_llm(transcript, prompt, cli_tool='claude'):
|
||
full_prompt = f"{prompt}\n\n---\n\nTranscrição:\n\n{transcript}"
|
||
|
||
with Progress(
|
||
SpinnerColumn(),
|
||
TextColumn("[progress.description]{task.description}"),
|
||
transient=True
|
||
) as progress:
|
||
progress.add_task(
|
||
description=f"🤖 Processando com {cli_tool}...",
|
||
total=None
|
||
)
|
||
|
||
if cli_tool == 'claude':
|
||
result = subprocess.run(
|
||
['claude', '-'],
|
||
input=full_prompt,
|
||
capture_output=True,
|
||
text=True,
|
||
timeout=300 # 5 minutes
|
||
)
|
||
elif cli_tool == 'gh-copilot':
|
||
result = subprocess.run(
|
||
['gh', 'copilot', 'suggest', '-t', 'shell', full_prompt],
|
||
capture_output=True,
|
||
text=True,
|
||
timeout=300
|
||
)
|
||
|
||
if result.returncode == 0:
|
||
return result.stdout.strip()
|
||
else:
|
||
return None
|
||
```
|
||
|
||
**Progress output:**
|
||
```
|
||
🤖 Processando com claude... ⠋
|
||
[After completion:]
|
||
✅ Ata gerada com sucesso!
|
||
```
|
||
|
||
|
||
#### **Final Output**
|
||
|
||
**Success (both files):**
|
||
```bash
|
||
💾 Salvando arquivos...
|
||
|
||
✅ Arquivos criados:
|
||
- transcript-20260203-023045.md (transcript puro)
|
||
- ata-20260203-023045.md (processado com LLM)
|
||
|
||
🧹 Removidos arquivos temporários: metadata.json, transcription.json
|
||
|
||
✅ Concluído! Tempo total: 3m 45s
|
||
```
|
||
|
||
**Transcript only (user declined LLM):**
|
||
```bash
|
||
💾 Salvando arquivos...
|
||
|
||
✅ Arquivo criado:
|
||
- transcript-20260203-023045.md
|
||
|
||
ℹ️ Ata não gerada (processamento LLM recusado pelo usuário)
|
||
|
||
🧹 Removidos arquivos temporários: metadata.json, transcription.json
|
||
|
||
✅ Concluído!
|
||
```
|
||
|
||
|
||
### Step 5: Display Results Summary
|
||
|
||
**Objective:** Show completion status and next steps.
|
||
|
||
**Output:**
|
||
|
||
```bash
|
||
echo ""
|
||
echo "✅ Transcription Complete!"
|
||
echo ""
|
||
echo "📊 Results:"
|
||
echo " File: $OUTPUT_FILE"
|
||
echo " Language: $LANGUAGE"
|
||
echo " Duration: $DURATION_HMS"
|
||
echo " Speakers: $NUM_SPEAKERS"
|
||
echo " Words: $WORD_COUNT"
|
||
echo " Processing time: ${ELAPSED_TIME}s"
|
||
echo ""
|
||
echo "📝 Generated:"
|
||
echo " - $OUTPUT_FILE (Markdown report)"
|
||
[if alternative formats:]
|
||
echo " - ${OUTPUT_FILE%.*}.srt (Subtitles)"
|
||
echo " - ${OUTPUT_FILE%.*}.json (Structured data)"
|
||
echo ""
|
||
echo "🎯 Next steps:"
|
||
echo " 1. Review meeting minutes and action items"
|
||
echo " 2. Share report with participants"
|
||
echo " 3. Track action items to completion"
|
||
```
|
||
|
||
|
||
## Example Usage
|
||
|
||
### **Example 1: Basic Transcription**
|
||
|
||
**User Input:**
|
||
```bash
|
||
copilot> transcribe audio to markdown: meeting-2026-02-02.mp3
|
||
```
|
||
|
||
**Skill Output:**
|
||
|
||
```bash
|
||
✅ Faster-Whisper detected (optimized)
|
||
✅ ffmpeg available (format conversion enabled)
|
||
|
||
📂 File: meeting-2026-02-02.mp3
|
||
📊 Size: 12.3 MB
|
||
⏱️ Duration: 00:45:32
|
||
|
||
🎙️ Processing...
|
||
[████████████████████] 100%
|
||
|
||
✅ Language detected: Portuguese (pt-BR)
|
||
👥 Speakers identified: 4
|
||
📝 Generating Markdown output...
|
||
|
||
✅ Transcription Complete!
|
||
|
||
📊 Results:
|
||
File: meeting-2026-02-02.md
|
||
Language: pt-BR
|
||
Duration: 00:45:32
|
||
Speakers: 4
|
||
Words: 6,842
|
||
Processing time: 127s
|
||
|
||
📝 Generated:
|
||
- meeting-2026-02-02.md (Markdown report)
|
||
|
||
🎯 Next steps:
|
||
1. Review meeting minutes and action items
|
||
2. Share report with participants
|
||
3. Track action items to completion
|
||
```
|
||
|
||
|
||
### **Example 3: Batch Processing**
|
||
|
||
**User Input:**
|
||
```bash
|
||
copilot> transcreva estes áudios: recordings/*.mp3
|
||
```
|
||
|
||
**Skill Output:**
|
||
|
||
```bash
|
||
📦 Batch mode: 5 files found
|
||
1. team-standup.mp3
|
||
2. client-call.mp3
|
||
3. brainstorm-session.mp3
|
||
4. product-demo.mp3
|
||
5. retrospective.mp3
|
||
|
||
🎙️ Processing batch...
|
||
|
||
[1/5] team-standup.mp3 ✅ (2m 34s)
|
||
[2/5] client-call.mp3 ✅ (15m 12s)
|
||
[3/5] brainstorm-session.mp3 ✅ (8m 47s)
|
||
[4/5] product-demo.mp3 ✅ (22m 03s)
|
||
[5/5] retrospective.mp3 ✅ (11m 28s)
|
||
|
||
✅ Batch Complete!
|
||
📝 Generated 5 Markdown reports
|
||
⏱️ Total processing time: 6m 15s
|
||
```
|
||
|
||
|
||
### **Example 5: Large File Warning**
|
||
|
||
**User Input:**
|
||
```bash
|
||
copilot> transcribe audio to markdown: conference-keynote.mp3
|
||
```
|
||
|
||
**Skill Output:**
|
||
|
||
```bash
|
||
✅ Faster-Whisper detected (optimized)
|
||
|
||
📂 File: conference-keynote.mp3
|
||
📊 Size: 87.2 MB
|
||
⏱️ Duration: 02:15:47
|
||
⚠️ Large file (87.2 MB) - processing may take several minutes
|
||
|
||
Continue? [Y/n]:
|
||
```
|
||
|
||
**User:** `Y`
|
||
|
||
```bash
|
||
🎙️ Processing... (this may take 10-15 minutes)
|
||
[████░░░░░░░░░░░░░░░░] 20% - Estimated time remaining: 12m
|
||
```
|
||
|
||
|
||
This skill is **platform-agnostic** and works in any terminal context where GitHub Copilot CLI is available. It does not depend on specific project configurations or external APIs, following the zero-configuration philosophy.
|