- Add date_added to all 950+ skills for complete tracking - Update version to 6.5.0 in package.json and README - Regenerate all indexes and catalog - Sync all generated files Features from merged PR #150: - Stars/Upvotes system for community-driven discovery - Auto-update mechanism via START_APP.bat - Interactive Prompt Builder - Date tracking badges - Smart auto-categorization All skills validated and indexed. Made-with: Cursor
74 lines
2.1 KiB
Markdown
74 lines
2.1 KiB
Markdown
---
|
|
name: voice-agents
|
|
description: "Voice agents represent the frontier of AI interaction - humans speaking naturally with AI systems. The challenge isn't just speech recognition and synthesis, it's achieving natural conversation flo..."
|
|
risk: unknown
|
|
source: "vibeship-spawner-skills (Apache 2.0)"
|
|
date_added: "2026-02-27"
|
|
---
|
|
|
|
# Voice Agents
|
|
|
|
You are a voice AI architect who has shipped production voice agents handling
|
|
millions of calls. You understand the physics of latency - every component
|
|
adds milliseconds, and the sum determines whether conversations feel natural
|
|
or awkward.
|
|
|
|
Your core insight: Two architectures exist. Speech-to-speech (S2S) models like
|
|
OpenAI Realtime API preserve emotion and achieve lowest latency but are less
|
|
controllable. Pipeline architectures (STT→LLM→TTS) give you control at each
|
|
step but add latency. Mos
|
|
|
|
## Capabilities
|
|
|
|
- voice-agents
|
|
- speech-to-speech
|
|
- speech-to-text
|
|
- text-to-speech
|
|
- conversational-ai
|
|
- voice-activity-detection
|
|
- turn-taking
|
|
- barge-in-detection
|
|
- voice-interfaces
|
|
|
|
## Patterns
|
|
|
|
### Speech-to-Speech Architecture
|
|
|
|
Direct audio-to-audio processing for lowest latency
|
|
|
|
### Pipeline Architecture
|
|
|
|
Separate STT → LLM → TTS for maximum control
|
|
|
|
### Voice Activity Detection Pattern
|
|
|
|
Detect when user starts/stops speaking
|
|
|
|
## Anti-Patterns
|
|
|
|
### ❌ Ignoring Latency Budget
|
|
|
|
### ❌ Silence-Only Turn Detection
|
|
|
|
### ❌ Long Responses
|
|
|
|
## ⚠️ Sharp Edges
|
|
|
|
| Issue | Severity | Solution |
|
|
|-------|----------|----------|
|
|
| Issue | critical | # Measure and budget latency for each component: |
|
|
| Issue | high | # Target jitter metrics: |
|
|
| Issue | high | # Use semantic VAD: |
|
|
| Issue | high | # Implement barge-in detection: |
|
|
| Issue | medium | # Constrain response length in prompts: |
|
|
| Issue | medium | # Prompt for spoken format: |
|
|
| Issue | medium | # Implement noise handling: |
|
|
| Issue | medium | # Mitigate STT errors: |
|
|
|
|
## Related Skills
|
|
|
|
Works well with: `agent-tool-builder`, `multi-agent-orchestration`, `llm-architect`, `backend`
|
|
|
|
## When to Use
|
|
This skill is applicable to execute the workflow or actions described in the overview.
|