* chore: upgrade maintenance scripts to robust PyYAML parsing - Replaces fragile regex frontmatter parsing with PyYAML/yaml library - Ensures multi-line descriptions and complex characters are handled safely - Normalizes quoting and field ordering across all maintenance scripts - Updates validator to strictly enforce description quality * fix: restore and refine truncated skill descriptions - Recovered 223+ truncated descriptions from git history (6.5.0 regression) - Refined long descriptions into concise, complete sentences (<200 chars) - Added missing descriptions for brainstorming and orchestration skills - Manually fixed imagen skill description - Resolved dangling links in competitor-alternatives skill * chore: sync generated registry files and document fixes - Regenerated skills index with normalized forward-slash paths - Updated README and CATALOG to reflect restored descriptions - Documented restoration and script improvements in CHANGELOG.md * fix: restore missing skill and align metadata for full 955 count - Renamed SKILL.MD to SKILL.md in andruia-skill-smith to ensure indexing - Fixed risk level and missing section in andruia-skill-smith - Synchronized all registry files for final 955 skill count * chore(scripts): add cross-platform runners and hermetic test orchestration * fix(scripts): harden utf-8 output and clone target writeability * fix(skills): add missing date metadata for strict validation * chore(index): sync generated metadata dates * fix(catalog): normalize skill paths to prevent CI drift * chore: sync generated registry files * fix: enforce LF line endings for generated registry files
229 lines
7.2 KiB
Markdown
229 lines
7.2 KiB
Markdown
---
|
|
name: azure-ai-voicelive-java
|
|
description: Azure AI VoiceLive SDK for Java. Real-time bidirectional voice conversations with AI assistants using WebSocket.
|
|
risk: unknown
|
|
source: community
|
|
date_added: '2026-02-27'
|
|
---
|
|
|
|
# Azure AI VoiceLive SDK for Java
|
|
|
|
Real-time, bidirectional voice conversations with AI assistants using WebSocket technology.
|
|
|
|
## Installation
|
|
|
|
```xml
|
|
<dependency>
|
|
<groupId>com.azure</groupId>
|
|
<artifactId>azure-ai-voicelive</artifactId>
|
|
<version>1.0.0-beta.2</version>
|
|
</dependency>
|
|
```
|
|
|
|
## Environment Variables
|
|
|
|
```bash
|
|
AZURE_VOICELIVE_ENDPOINT=https://<resource>.openai.azure.com/
|
|
AZURE_VOICELIVE_API_KEY=<your-api-key>
|
|
```
|
|
|
|
## Authentication
|
|
|
|
### API Key
|
|
|
|
```java
|
|
import com.azure.ai.voicelive.VoiceLiveAsyncClient;
|
|
import com.azure.ai.voicelive.VoiceLiveClientBuilder;
|
|
import com.azure.core.credential.AzureKeyCredential;
|
|
|
|
VoiceLiveAsyncClient client = new VoiceLiveClientBuilder()
|
|
.endpoint(System.getenv("AZURE_VOICELIVE_ENDPOINT"))
|
|
.credential(new AzureKeyCredential(System.getenv("AZURE_VOICELIVE_API_KEY")))
|
|
.buildAsyncClient();
|
|
```
|
|
|
|
### DefaultAzureCredential (Recommended)
|
|
|
|
```java
|
|
import com.azure.identity.DefaultAzureCredentialBuilder;
|
|
|
|
VoiceLiveAsyncClient client = new VoiceLiveClientBuilder()
|
|
.endpoint(System.getenv("AZURE_VOICELIVE_ENDPOINT"))
|
|
.credential(new DefaultAzureCredentialBuilder().build())
|
|
.buildAsyncClient();
|
|
```
|
|
|
|
## Key Concepts
|
|
|
|
| Concept | Description |
|
|
|---------|-------------|
|
|
| `VoiceLiveAsyncClient` | Main entry point for voice sessions |
|
|
| `VoiceLiveSessionAsyncClient` | Active WebSocket connection for streaming |
|
|
| `VoiceLiveSessionOptions` | Configuration for session behavior |
|
|
|
|
### Audio Requirements
|
|
|
|
- **Sample Rate**: 24kHz (24000 Hz)
|
|
- **Bit Depth**: 16-bit PCM
|
|
- **Channels**: Mono (1 channel)
|
|
- **Format**: Signed PCM, little-endian
|
|
|
|
## Core Workflow
|
|
|
|
### 1. Start Session
|
|
|
|
```java
|
|
import reactor.core.publisher.Mono;
|
|
|
|
client.startSession("gpt-4o-realtime-preview")
|
|
.flatMap(session -> {
|
|
System.out.println("Session started");
|
|
|
|
// Subscribe to events
|
|
session.receiveEvents()
|
|
.subscribe(
|
|
event -> System.out.println("Event: " + event.getType()),
|
|
error -> System.err.println("Error: " + error.getMessage())
|
|
);
|
|
|
|
return Mono.just(session);
|
|
})
|
|
.block();
|
|
```
|
|
|
|
### 2. Configure Session Options
|
|
|
|
```java
|
|
import com.azure.ai.voicelive.models.*;
|
|
import java.util.Arrays;
|
|
|
|
ServerVadTurnDetection turnDetection = new ServerVadTurnDetection()
|
|
.setThreshold(0.5) // Sensitivity (0.0-1.0)
|
|
.setPrefixPaddingMs(300) // Audio before speech
|
|
.setSilenceDurationMs(500) // Silence to end turn
|
|
.setInterruptResponse(true) // Allow interruptions
|
|
.setAutoTruncate(true)
|
|
.setCreateResponse(true);
|
|
|
|
AudioInputTranscriptionOptions transcription = new AudioInputTranscriptionOptions(
|
|
AudioInputTranscriptionOptionsModel.WHISPER_1);
|
|
|
|
VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()
|
|
.setInstructions("You are a helpful AI voice assistant.")
|
|
.setVoice(BinaryData.fromObject(new OpenAIVoice(OpenAIVoiceName.ALLOY)))
|
|
.setModalities(Arrays.asList(InteractionModality.TEXT, InteractionModality.AUDIO))
|
|
.setInputAudioFormat(InputAudioFormat.PCM16)
|
|
.setOutputAudioFormat(OutputAudioFormat.PCM16)
|
|
.setInputAudioSamplingRate(24000)
|
|
.setInputAudioNoiseReduction(new AudioNoiseReduction(AudioNoiseReductionType.NEAR_FIELD))
|
|
.setInputAudioEchoCancellation(new AudioEchoCancellation())
|
|
.setInputAudioTranscription(transcription)
|
|
.setTurnDetection(turnDetection);
|
|
|
|
// Send configuration
|
|
ClientEventSessionUpdate updateEvent = new ClientEventSessionUpdate(options);
|
|
session.sendEvent(updateEvent).subscribe();
|
|
```
|
|
|
|
### 3. Send Audio Input
|
|
|
|
```java
|
|
byte[] audioData = readAudioChunk(); // Your PCM16 audio data
|
|
session.sendInputAudio(BinaryData.fromBytes(audioData)).subscribe();
|
|
```
|
|
|
|
### 4. Handle Events
|
|
|
|
```java
|
|
session.receiveEvents().subscribe(event -> {
|
|
ServerEventType eventType = event.getType();
|
|
|
|
if (ServerEventType.SESSION_CREATED.equals(eventType)) {
|
|
System.out.println("Session created");
|
|
} else if (ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED.equals(eventType)) {
|
|
System.out.println("User started speaking");
|
|
} else if (ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STOPPED.equals(eventType)) {
|
|
System.out.println("User stopped speaking");
|
|
} else if (ServerEventType.RESPONSE_AUDIO_DELTA.equals(eventType)) {
|
|
if (event instanceof SessionUpdateResponseAudioDelta) {
|
|
SessionUpdateResponseAudioDelta audioEvent = (SessionUpdateResponseAudioDelta) event;
|
|
playAudioChunk(audioEvent.getDelta());
|
|
}
|
|
} else if (ServerEventType.RESPONSE_DONE.equals(eventType)) {
|
|
System.out.println("Response complete");
|
|
} else if (ServerEventType.ERROR.equals(eventType)) {
|
|
if (event instanceof SessionUpdateError) {
|
|
SessionUpdateError errorEvent = (SessionUpdateError) event;
|
|
System.err.println("Error: " + errorEvent.getError().getMessage());
|
|
}
|
|
}
|
|
});
|
|
```
|
|
|
|
## Voice Configuration
|
|
|
|
### OpenAI Voices
|
|
|
|
```java
|
|
// Available: ALLOY, ASH, BALLAD, CORAL, ECHO, SAGE, SHIMMER, VERSE
|
|
VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()
|
|
.setVoice(BinaryData.fromObject(new OpenAIVoice(OpenAIVoiceName.ALLOY)));
|
|
```
|
|
|
|
### Azure Voices
|
|
|
|
```java
|
|
// Azure Standard Voice
|
|
options.setVoice(BinaryData.fromObject(new AzureStandardVoice("en-US-JennyNeural")));
|
|
|
|
// Azure Custom Voice
|
|
options.setVoice(BinaryData.fromObject(new AzureCustomVoice("myVoice", "endpointId")));
|
|
|
|
// Azure Personal Voice
|
|
options.setVoice(BinaryData.fromObject(
|
|
new AzurePersonalVoice("speakerProfileId", PersonalVoiceModels.PHOENIX_LATEST_NEURAL)));
|
|
```
|
|
|
|
## Function Calling
|
|
|
|
```java
|
|
VoiceLiveFunctionDefinition weatherFunction = new VoiceLiveFunctionDefinition("get_weather")
|
|
.setDescription("Get current weather for a location")
|
|
.setParameters(BinaryData.fromObject(parametersSchema));
|
|
|
|
VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()
|
|
.setTools(Arrays.asList(weatherFunction))
|
|
.setInstructions("You have access to weather information.");
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Use async client** — VoiceLive requires reactive patterns
|
|
2. **Configure turn detection** for natural conversation flow
|
|
3. **Enable noise reduction** for better speech recognition
|
|
4. **Handle interruptions** gracefully with `setInterruptResponse(true)`
|
|
5. **Use Whisper transcription** for input audio transcription
|
|
6. **Close sessions** properly when conversation ends
|
|
|
|
## Error Handling
|
|
|
|
```java
|
|
session.receiveEvents()
|
|
.doOnError(error -> System.err.println("Connection error: " + error.getMessage()))
|
|
.onErrorResume(error -> {
|
|
// Attempt reconnection or cleanup
|
|
return Flux.empty();
|
|
})
|
|
.subscribe();
|
|
```
|
|
|
|
## Reference Links
|
|
|
|
| Resource | URL |
|
|
|----------|-----|
|
|
| GitHub Source | https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/ai/azure-ai-voicelive |
|
|
| Samples | https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/ai/azure-ai-voicelive/src/samples |
|
|
|
|
## When to Use
|
|
This skill is applicable to execute the workflow or actions described in the overview.
|