firefrost-gaming/antigravity-skills-reference

Fork 0

Files

Zied b42ab600ec feat: add in-app Sync Skills button and simplify START_APP.bat launcher

2026-03-02 09:56:15 +01:00

12 KiB

Raw Blame History

Common Pitfalls and Solutions

This document covers common issues encountered when building voice AI engines and their solutions.

1. Audio Jumping/Cutting Off

Problem

The bot's audio jumps or cuts off mid-response, creating a jarring user experience.

Symptoms

Audio plays in fragments
Sentences are incomplete
Multiple audio streams overlap
Unnatural pauses or gaps

Root Cause

Sending text to the synthesizer in small chunks (sentence-by-sentence or word-by-word) causes multiple TTS API calls. Each call generates a separate audio stream, resulting in:

Multiple audio files being played sequentially
Timing issues between chunks
Potential overlapping audio
Inconsistent voice characteristics between chunks

Solution

Buffer the entire LLM response before sending it to the synthesizer:

❌ Bad: Yields sentence-by-sentence

async def generate_response(self, prompt):
    async for sentence in llm_stream:
        # This creates multiple TTS calls!
        yield GeneratedResponse(message=BaseMessage(text=sentence))

✅ Good: Buffer entire response

async def generate_response(self, prompt):
    # Buffer the entire response
    full_response = ""
    async for chunk in llm_stream:
        full_response += chunk
    
    # Yield once with complete response
    yield GeneratedResponse(message=BaseMessage(text=full_response))

Why This Works

Single TTS call for the entire response
Consistent voice characteristics
Proper timing and pacing
No gaps or overlaps

2. Echo/Feedback Loop

Problem

The bot hears itself speaking and responds to its own audio, creating an infinite loop.

Symptoms

Bot responds to its own speech
Conversation becomes nonsensical
Transcriptions include bot's own words
System becomes unresponsive

Root Cause

The transcriber continues to process audio while the bot is speaking. If the bot's audio is being played through speakers and captured by the microphone, the transcriber will transcribe the bot's own speech.

Solution

Mute the transcriber when the bot starts speaking:

# Before sending audio to output
self.transcriber.mute()

# Send audio...
await self.send_speech_to_output(synthesis_result)

# After audio playback complete
self.transcriber.unmute()

Implementation in Transcriber

class BaseTranscriber:
    def __init__(self):
        self.is_muted = False
    
    def send_audio(self, chunk: bytes):
        """Client calls this to send audio"""
        if not self.is_muted:
            self.input_queue.put_nowait(chunk)
        else:
            # Send silence instead (prevents echo)
            self.input_queue.put_nowait(self.create_silent_chunk(len(chunk)))
    
    def mute(self):
        """Called when bot starts speaking"""
        self.is_muted = True
    
    def unmute(self):
        """Called when bot stops speaking"""
        self.is_muted = False
    
    def create_silent_chunk(self, size: int) -> bytes:
        """Create a silent audio chunk"""
        return b'\x00' * size

Why This Works

Transcriber receives silence while bot speaks
No transcription of bot's own speech
Prevents feedback loop
Maintains audio stream continuity

3. Interrupts Not Working

Problem

Users cannot interrupt the bot mid-sentence. The bot continues speaking even when the user starts talking.

Symptoms

Bot speaks over user
User must wait for bot to finish
Unnatural conversation flow
Poor user experience

Root Cause

All audio chunks are sent to the client immediately, buffering the entire message on the client side. By the time an interrupt is detected, all audio has already been sent and is queued for playback.

Solution

Rate-limit audio chunks to match real-time playback:

❌ Bad: Send all chunks immediately

async for chunk in synthesis_result.chunk_generator:
    # Sends all chunks as fast as possible
    output_device.consume_nonblocking(chunk)

✅ Good: Rate-limit chunks

async for chunk in synthesis_result.chunk_generator:
    # Check for interrupt
    if stop_event.is_set():
        # Calculate partial message
        partial_message = synthesis_result.get_message_up_to(
            chunk_idx * seconds_per_chunk
        )
        return partial_message, True  # cut_off = True
    
    start_time = time.time()
    
    # Send chunk
    output_device.consume_nonblocking(chunk)
    
    # CRITICAL: Wait for chunk duration before sending next
    processing_time = time.time() - start_time
    await asyncio.sleep(max(seconds_per_chunk - processing_time, 0))
    
    chunk_idx += 1

Why This Works

Only one chunk is buffered on client at a time
Interrupts can stop mid-sentence
Natural conversation flow
Real-time playback maintained

Calculating `seconds_per_chunk`

# For LINEAR16 PCM audio at 16kHz
sample_rate = 16000  # Hz
chunk_size = 1024    # bytes
bytes_per_sample = 2  # 16-bit = 2 bytes

samples_per_chunk = chunk_size / bytes_per_sample
seconds_per_chunk = samples_per_chunk / sample_rate
# = 1024 / 2 / 16000 = 0.032 seconds

4. Memory Leaks from Unclosed Streams

Problem

Memory usage grows over time, eventually causing the application to crash.

Symptoms

Increasing memory usage
Slow performance over time
WebSocket connections not closing
Resource exhaustion

Root Cause

WebSocket connections, API streams, or async tasks are not properly closed when conversations end or errors occur.

Solution

Always use context managers and cleanup:

❌ Bad: No cleanup

async def handle_conversation(websocket):
    conversation = create_conversation()
    await conversation.start()
    
    async for message in websocket.iter_bytes():
        conversation.receive_audio(message)
    # No cleanup! Resources leak

✅ Good: Proper cleanup

async def handle_conversation(websocket):
    conversation = None
    try:
        conversation = create_conversation()
        await conversation.start()
        
        async for message in websocket.iter_bytes():
            conversation.receive_audio(message)
            
    except WebSocketDisconnect:
        logger.info("Client disconnected")
    except Exception as e:
        logger.error(f"Error: {e}", exc_info=True)
    finally:
        # Always cleanup
        if conversation:
            await conversation.terminate()

Proper Termination

async def terminate(self):
    """Gracefully shut down all workers"""
    self.active = False
    
    # Stop all workers
    self.transcriber.terminate()
    self.agent.terminate()
    self.synthesizer.terminate()
    
    # Wait for queues to drain
    await asyncio.sleep(0.5)
    
    # Close connections
    if self.websocket:
        await self.websocket.close()
    
    # Cancel tasks
    for task in self.tasks:
        if not task.done():
            task.cancel()

5. Conversation History Not Updating

Problem

The agent doesn't remember previous messages or context is lost.

Symptoms

Agent repeats itself
No context from previous messages
Each response is independent
Poor conversation quality

Root Cause

Conversation history is not being maintained or updated correctly.

Solution

Maintain conversation history in the agent:

class Agent:
    def __init__(self):
        self.conversation_history = []
    
    async def generate_response(self, user_input):
        # Add user message to history
        self.conversation_history.append({
            "role": "user",
            "content": user_input
        })
        
        # Generate response with full history
        response = await self.llm.generate(self.conversation_history)
        
        # Add bot response to history
        self.conversation_history.append({
            "role": "assistant",
            "content": response
        })
        
        return response

Handling Interrupts

When the bot is interrupted, update history with partial message:

def update_last_bot_message_on_cut_off(self, partial_message):
    """Update history when bot is interrupted"""
    if self.conversation_history and \
       self.conversation_history[-1]["role"] == "assistant":
        # Update with what was actually spoken
        self.conversation_history[-1]["content"] = partial_message

6. WebSocket Connection Drops

Problem

WebSocket connections drop unexpectedly, interrupting conversations.

Symptoms

Frequent disconnections
Connection timeouts
"Connection closed" errors
Unstable conversations

Root Cause

No heartbeat/ping mechanism
Idle timeout
Network issues
Server overload

Solution

Implement heartbeat and reconnection:

@app.websocket("/conversation")
async def conversation_endpoint(websocket: WebSocket):
    await websocket.accept()
    
    # Start heartbeat
    async def heartbeat():
        while True:
            try:
                await websocket.send_json({"type": "ping"})
                await asyncio.sleep(30)  # Ping every 30 seconds
            except:
                break
    
    heartbeat_task = asyncio.create_task(heartbeat())
    
    try:
        async for message in websocket.iter_bytes():
            # Process message
            pass
    finally:
        heartbeat_task.cancel()

7. High Latency / Slow Responses

Problem

Long delays between user speech and bot response.

Symptoms

Noticeable lag
Poor user experience
Conversation feels unnatural
Users repeat themselves

Root Causes & Solutions

1. Not using streaming

# ❌ Bad: Wait for entire response
response = await llm.complete(prompt)

# ✅ Good: Stream response
async for chunk in llm.complete(prompt, stream=True):
    yield chunk

2. Sequential processing

# ❌ Bad: Sequential
transcription = await transcriber.transcribe(audio)
response = await agent.generate(transcription)
audio = await synthesizer.synthesize(response)

# ✅ Good: Concurrent with queues
# All workers run simultaneously

3. Large chunk sizes

# ❌ Bad: Large chunks (high latency)
chunk_size = 8192  # 0.25 seconds

# ✅ Good: Small chunks (low latency)
chunk_size = 1024  # 0.032 seconds

8. Audio Quality Issues

Problem

Poor audio quality, distortion, or artifacts.

Symptoms

Robotic voice
Crackling or popping
Distorted audio
Inconsistent volume

Root Causes & Solutions

1. Wrong audio format

# ✅ Use LINEAR16 PCM at 16kHz
audio_encoding = AudioEncoding.LINEAR16
sample_rate = 16000

2. Incorrect format conversion

# ✅ Proper MP3 to PCM conversion
from pydub import AudioSegment
import io

def mp3_to_pcm(mp3_bytes):
    audio = AudioSegment.from_mp3(io.BytesIO(mp3_bytes))
    audio = audio.set_frame_rate(16000)
    audio = audio.set_channels(1)
    audio = audio.set_sample_width(2)  # 16-bit
    return audio.raw_data

3. Buffer underruns

# ✅ Ensure consistent chunk timing
await asyncio.sleep(max(seconds_per_chunk - processing_time, 0))

Summary

Problem	Root Cause	Solution
Audio jumping	Multiple TTS calls	Buffer entire response
Echo/feedback	Transcriber active during bot speech	Mute transcriber
Interrupts not working	All chunks sent immediately	Rate-limit chunks
Memory leaks	Unclosed streams	Proper cleanup
Lost context	History not maintained	Update conversation history
Connection drops	No heartbeat	Implement ping/pong
High latency	Sequential processing	Use streaming + queues
Poor audio quality	Wrong format/conversion	Use LINEAR16 PCM 16kHz

Best Practices

Always buffer LLM responses before sending to synthesizer
Always mute transcriber when bot is speaking
Always rate-limit audio chunks to enable interrupts
Always cleanup resources in finally blocks
Always maintain conversation history for context
Always use streaming for low latency
Always use LINEAR16 PCM at 16kHz for audio
Always implement error handling in worker loops

12 KiB Raw Blame History

Common Pitfalls and Solutions

1. Audio Jumping/Cutting Off

Problem

Symptoms

Root Cause

Solution

Why This Works

2. Echo/Feedback Loop

Problem

Symptoms

Root Cause

Solution

Implementation in Transcriber

Why This Works

3. Interrupts Not Working

Problem

Symptoms

Root Cause

Solution

Why This Works

Calculating seconds_per_chunk

4. Memory Leaks from Unclosed Streams

Problem

Symptoms

Root Cause

Solution

Proper Termination

5. Conversation History Not Updating

Problem

Symptoms

Root Cause

Solution

Handling Interrupts

6. WebSocket Connection Drops

Problem

Symptoms

Root Cause

Solution

7. High Latency / Slow Responses

Problem

Symptoms

Root Causes & Solutions

8. Audio Quality Issues

Problem

Symptoms

Root Causes & Solutions

Summary

Best Practices

12 KiB

Raw Blame History

Calculating `seconds_per_chunk`