Voice Integration
Integrate Alfred's voice-first AI into your applications. Support for phone calls, web widgets, webhooks, and multiple voice engines.
Architecture Overview
Alfred Voice uses a real-time voice transport layer built into the platform. Calls come in via phone or web widget, get processed through the Alfred Voice API, and route to Alfred's tool engine for execution.
Voice Call Flow
STT → Alfred Processing → TTS, with real-time tool execution in the middle
Key Components
- Alfred Voice API — Real-time voice transport and speech-to-text/text-to-speech pipeline
- Webhook Server — Your endpoint that receives voice events and returns tool results
- Tool Engine — Alfred's 1,220+ tools, executable via voice or API
- Voice Engines — Kokoro, Orpheus, Cartesia Sonic, ElevenLabs for TTS output
- Conference Rooms — Multi-agent voice collaboration spaces
Voice API Setup
The Alfred Voice API handles the real-time voice connection between users and Alfred. You'll need to configure a voice assistant and point it to your Alfred webhook.
Step 1: Get Your Voice API Key
Navigate to your Developer Portal and generate a Voice API key from the dashboard.
Step 2: Create an Assistant
curl -X POST https://gositeme.com/api/voice-assistant \
-H "Authorization: Bearer your_voice_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "Alfred AI",
"model": {
"provider": "custom-llm",
"url": "https://gositeme.com/api/vapi-webhook.php",
"model": "alfred-v2"
},
"voice": {
"provider": "cartesia",
"voiceId": "sonic-english-male-1"
},
"firstMessage": "Hello! I am Alfred, your AI assistant. How can I help you today?",
"serverUrl": "https://gositeme.com/api/vapi-webhook.php"
}'
Step 3: Configure Server URL
Point your voice assistant's server URL to Alfred's webhook endpoint:
This endpoint receives all voice events: call start, speech input, tool calls, and call end.
Webhook Configuration
Alfred exposes multiple webhook endpoints for voice event handling:
| Endpoint | Purpose |
|---|---|
/api/vapi-webhook.php | Main voice event handler (call events, messages, tool calls) |
/api/vapi-callback.php | Callback handler for async tool results |
/api/vapi-tools.php | Tool registration endpoint for voice function calling |
/api/vapi-auth.php | Voice session authentication |
/api/vapi-outbound.php | Outbound call initiation |
Webhook Event Types
The main webhook receives events in this format:
// Call started
{
"message": {
"type": "assistant-request",
"call": {
"id": "call_abc123",
"phoneNumber": "+18334674836",
"customer": { "number": "+15145551234" }
}
}
}
// Tool call request
{
"message": {
"type": "tool-calls",
"toolCalls": [
{
"id": "tc_xyz",
"type": "function",
"function": {
"name": "weather_lookup",
"arguments": "{\"location\":\"Montreal\"}"
}
}
]
}
}
// Call ended
{
"message": {
"type": "end-of-call-report",
"call": { "id": "call_abc123" },
"summary": "User asked about weather in Montreal",
"duration": 45
}
}
Handling Webhook Events (PHP)
<?php
// vapi-webhook.php - Handle incoming voice events
$payload = json_decode(file_get_contents('php://input'), true);
$type = $payload['message']['type'] ?? '';
switch ($type) {
case 'assistant-request':
// New call - return assistant configuration
echo json_encode([
'assistant' => [
'firstMessage' => 'Hello! I am Alfred. How can I help?',
'model' => [
'provider' => 'openai',
'model' => 'gpt-4',
'systemMessage' => 'You are Alfred, an AI assistant with 1,220+ tools...'
],
'voice' => ['provider' => 'cartesia', 'voiceId' => 'sonic-english-male-1']
]
]);
break;
case 'tool-calls':
// Execute tool calls
$results = [];
foreach ($payload['message']['toolCalls'] as $tc) {
$args = json_decode($tc['function']['arguments'], true);
$result = executeAlfredTool($tc['function']['name'], $args);
$results[] = [
'toolCallId' => $tc['id'],
'result' => json_encode($result)
];
}
echo json_encode(['results' => $results]);
break;
case 'end-of-call-report':
// Log call summary
logCall($payload['message']);
echo json_encode(['success' => true]);
break;
}
?>
Phone Number Provisioning
Alfred's primary phone number is 1-833-GOSITEME (1-833-467-4836). For custom phone numbers, provision through the Voice API or the voice management endpoint:
Provision a Number
const number = await fetch('https://gositeme.com/api/voice-manage.php', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer sess_abc123...'
},
body: JSON.stringify({
action: 'provision_number',
area_code: '514',
country: 'CA',
agent_id: 'agent_xyz'
})
}).then(r => r.json());
console.log(number);
// { phone_number: "+15141234567", status: "active", agent: "agent_xyz" }
List Your Numbers
curl -H "Authorization: Bearer sess_abc123..." \
"https://gositeme.com/api/voice-manage.php?action=list_numbers"
# {
# "numbers": [
# { "number": "+15141234567", "agent": "agent_xyz", "calls_today": 23, "status": "active" }
# ]
# }
Initiate Outbound Call
import requests
call = requests.post('https://gositeme.com/api/vapi-outbound.php',
headers={'Authorization': 'Bearer sess_abc123...'},
json={
'action': 'call',
'to': '+15145551234',
'from_number': '+15141234567',
'agent_id': 'agent_xyz',
'context': 'Follow up on support ticket #1234'
}
).json()
print(f"Call initiated: {call['call_id']}")
Voice Agent Configuration
Voice agents are AI personalities that interact with callers. Configure their personality, tools, knowledge base, and voice settings.
Create a Voice Agent
const agent = await fetch('https://gositeme.com/api/fleet.php', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer sess_abc123...'
},
body: JSON.stringify({
action: 'create_agent',
name: 'Receptionist',
personality: 'Professional, warm receptionist for a law firm. Always ask for caller name and purpose.',
tools: ['schedule_appointment', 'lookup_client', 'transfer_call', 'take_message'],
knowledge_base: 'kb_firm_info',
voice_enabled: true,
voice_engine: 'cartesia',
voice_config: {
voice_id: 'sonic-english-female-1',
speed: 1.0,
emotion: 'friendly'
},
call_settings: {
max_duration: 600,
silence_timeout: 10,
greeting: 'Thank you for calling Smith & Associates. How may I direct your call?'
}
})
}).then(r => r.json());
Agent Configuration Reference
| Field | Type | Description |
|---|---|---|
personality | string | System prompt defining agent behavior and tone |
tools | array | Tool names the agent can invoke during calls |
knowledge_base | string | Knowledge base ID for RAG-powered responses |
voice_engine | string | TTS engine: kokoro, orpheus, cartesia, elevenlabs |
voice_config.voice_id | string | Specific voice ID from chosen engine |
voice_config.speed | float | Speech speed multiplier (0.5–2.0, default: 1.0) |
call_settings.max_duration | integer | Max call duration in seconds (default: 600) |
call_settings.silence_timeout | integer | Hang up after N seconds of silence (default: 10) |
call_settings.greeting | string | First message spoken when call connects |
Supported Voice Engines
Alfred supports multiple text-to-speech engines. Choose based on your needs for latency, quality, and expressiveness.
Engine Comparison
| Engine | Latency | Quality | Languages | Plan Required |
|---|---|---|---|---|
kokoro | ~80ms | Good | EN, FR, ES, DE, JA | All plans |
orpheus | ~150ms | Very Good | EN, FR, ES | All plans |
cartesia | ~120ms | Excellent | EN, FR, ES, DE, PT, JA, KO | Professional+ |
elevenlabs | ~200ms | Premium | 29 languages | Enterprise |
Setting a Voice Engine
// Update agent voice engine
await fetch('https://gositeme.com/api/fleet.php', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer sess_abc123...'
},
body: JSON.stringify({
action: 'update_agent',
agent_id: 'agent_xyz',
voice_engine: 'cartesia',
voice_config: {
voice_id: 'sonic-english-female-1',
speed: 1.1,
emotion: 'professional'
}
})
});
import requests
requests.post('https://gositeme.com/api/fleet.php',
headers={'Authorization': 'Bearer sess_abc123...'},
json={
'action': 'update_agent',
'agent_id': 'agent_xyz',
'voice_engine': 'elevenlabs',
'voice_config': {
'voice_id': 'rachel',
'stability': 0.7,
'similarity_boost': 0.8
}
}
)
Web Voice Widget
Embed Alfred's voice widget in your website for browser-based voice interaction. The widget handles microphone access, speech-to-text, and real-time responses.
Quick Embed
<!-- Add to your page's <body> -->
<script src="https://gositeme.com/assets/js/alfred-voice-widget.js"></script>
<script>
AlfredVoice.init({
token: 'sess_abc123...', // Your session token
position: 'bottom-right', // bottom-right, bottom-left, top-right, top-left
theme: 'dark', // dark or light
accent: '#6c5ce7', // Widget accent color
greeting: 'Hi! How can I help you today?',
agent_id: 'agent_xyz', // Optional: use specific agent
voice_engine: 'kokoro', // TTS engine
tools: ['weather_lookup', 'dns_lookup', 'summarize_text'], // Allowed tools
onReady: () => console.log('Widget loaded'),
onMessage: (msg) => console.log('Alfred said:', msg.text),
onError: (err) => console.error('Voice error:', err)
});
</script>
Widget API
// Programmatic control
AlfredVoice.open(); // Open the widget
AlfredVoice.close(); // Close the widget
AlfredVoice.startListening(); // Start microphone
AlfredVoice.stopListening(); // Stop microphone
AlfredVoice.speak('Hello!'); // Text-to-speech
AlfredVoice.sendText('What is the weather in Montreal?'); // Send text input
AlfredVoice.destroy(); // Remove widget from DOM
// Event listeners
AlfredVoice.on('callStart', (data) => { /* call connected */ });
AlfredVoice.on('callEnd', (data) => { /* call ended */ });
AlfredVoice.on('transcript', (text) => { /* user speech recognized */ });
AlfredVoice.on('toolCall', (tool) => { /* tool being executed */ });
Custom Styling
/* Override widget styles */
.alfred-voice-widget {
--av-bg: #12121e;
--av-accent: #6c5ce7;
--av-text: #e8e8f0;
--av-radius: 16px;
--av-shadow: 0 8px 32px rgba(0,0,0,0.4);
}
.alfred-voice-btn {
width: 64px;
height: 64px;
border-radius: 50%;
}
.alfred-voice-panel {
width: 380px;
max-height: 500px;
}
Conference Room API
Conference rooms allow multiple AI agents and human participants to collaborate in real-time voice sessions. Useful for multi-agent workflows, team meetings with AI support, and complex problem-solving.
Create a Room
const room = await fetch('https://gositeme.com/api/fleet.php', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer sess_abc123...'
},
body: JSON.stringify({
action: 'create_room',
name: 'Strategy Session',
agents: ['agent_analyst', 'agent_researcher', 'agent_writer'],
max_participants: 10,
recording: true,
auto_transcribe: true
})
}).then(r => r.json());
console.log(room);
// {
// id: "room_abc",
// name: "Strategy Session",
// join_url: "https://gositeme.com/conference-room.php?id=room_abc",
// agents: 3,
// recording: true
// }
Join a Room
// Via web browser
window.location.href = room.join_url;
// Via API (add participant)
await fetch('https://gositeme.com/api/fleet.php', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer sess_abc123...'
},
body: JSON.stringify({
action: 'join_room',
room_id: 'room_abc',
participant: {
name: 'John Doe',
role: 'moderator'
}
})
});
Room Management
# Get room status
curl -H "Authorization: Bearer sess_abc123..." \
"https://gositeme.com/api/fleet.php?action=room_status&room_id=room_abc"
# End room and get transcript
curl -X POST https://gositeme.com/api/fleet.php \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sess_abc123..." \
-d '{"action":"end_room","room_id":"room_abc"}'
# Response includes transcript and recording URL
# {
# "transcript": "...",
# "recording_url": "https://gositeme.com/recordings/room_abc.mp3",
# "duration": 1847,
# "participants": 5,
# "tools_used": 12
# }
Someone from somewhere
just launched website.com
Just now