Voice Integration
Integrate Alfred's voice-first AI into your applications. Support for phone calls, web widgets, webhooks, and multiple voice engines.
Architecture Overview
Alfred Voice uses VAPI as the real-time voice transport layer. Calls come in via phone or web widget, get processed through VAPI, and route to Alfred's tool engine for execution.
Voice Call Flow
STT → Alfred Processing → TTS, with real-time tool execution in the middle
Key Components
- VAPI — Real-time voice transport and speech-to-text/text-to-speech pipeline
- Webhook Server — Your endpoint that receives voice events and returns tool results
- Tool Engine — Alfred's 875+ tools, executable via voice or API
- Voice Engines — Kokoro, Orpheus, Cartesia Sonic, ElevenLabs for TTS output
- Conference Rooms — Multi-agent voice collaboration spaces
VAPI Setup
VAPI handles the real-time voice connection between users and Alfred. You'll need to configure a VAPI assistant and point it to your Alfred webhook.
Step 1: Create a VAPI Account
Sign up at vapi.ai and obtain your API key from the dashboard.
Step 2: Create an Assistant
curl -X POST https://api.vapi.ai/assistant \
-H "Authorization: Bearer vapi_key_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"name": "Alfred AI",
"model": {
"provider": "custom-llm",
"url": "https://gositeme.com/api/vapi-webhook.php",
"model": "alfred-v2"
},
"voice": {
"provider": "cartesia",
"voiceId": "sonic-english-male-1"
},
"firstMessage": "Hello! I am Alfred, your AI assistant. How can I help you today?",
"serverUrl": "https://gositeme.com/api/vapi-webhook.php"
}'
Step 3: Configure Server URL
Point your VAPI assistant's server URL to Alfred's webhook endpoint:
This endpoint receives all VAPI events: call start, speech input, tool calls, and call end.
Webhook Configuration
Alfred exposes multiple webhook endpoints for voice event handling:
| Endpoint | Purpose |
|---|---|
/api/vapi-webhook.php | Main VAPI event handler (call events, messages, tool calls) |
/api/vapi-callback.php | Callback handler for async tool results |
/api/vapi-tools.php | Tool registration endpoint for VAPI function calling |
/api/vapi-auth.php | Voice session authentication |
/api/vapi-outbound.php | Outbound call initiation |
Webhook Event Types
The main webhook receives events in this format:
// Call started
{
"message": {
"type": "assistant-request",
"call": {
"id": "call_abc123",
"phoneNumber": "+18334674836",
"customer": { "number": "+15145551234" }
}
}
}
// Tool call request
{
"message": {
"type": "tool-calls",
"toolCalls": [
{
"id": "tc_xyz",
"type": "function",
"function": {
"name": "weather_lookup",
"arguments": "{\"location\":\"Montreal\"}"
}
}
]
}
}
// Call ended
{
"message": {
"type": "end-of-call-report",
"call": { "id": "call_abc123" },
"summary": "User asked about weather in Montreal",
"duration": 45
}
}
Handling Webhook Events (PHP)
<?php
// vapi-webhook.php - Handle incoming VAPI events
$payload = json_decode(file_get_contents('php://input'), true);
$type = $payload['message']['type'] ?? '';
switch ($type) {
case 'assistant-request':
// New call - return assistant configuration
echo json_encode([
'assistant' => [
'firstMessage' => 'Hello! I am Alfred. How can I help?',
'model' => [
'provider' => 'openai',
'model' => 'gpt-4',
'systemMessage' => 'You are Alfred, an AI assistant with 875+ tools...'
],
'voice' => ['provider' => 'cartesia', 'voiceId' => 'sonic-english-male-1']
]
]);
break;
case 'tool-calls':
// Execute tool calls
$results = [];
foreach ($payload['message']['toolCalls'] as $tc) {
$args = json_decode($tc['function']['arguments'], true);
$result = executeAlfredTool($tc['function']['name'], $args);
$results[] = [
'toolCallId' => $tc['id'],
'result' => json_encode($result)
];
}
echo json_encode(['results' => $results]);
break;
case 'end-of-call-report':
// Log call summary
logCall($payload['message']);
echo json_encode(['success' => true]);
break;
}
?>
Phone Number Provisioning
Alfred's primary phone number is 1-833-GOSITEME (1-833-467-4836). For custom phone numbers, provision through VAPI or the voice management API:
Provision a Number
const number = await fetch('https://gositeme.com/api/voice-manage.php', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer sess_abc123...'
},
body: JSON.stringify({
action: 'provision_number',
area_code: '514',
country: 'CA',
agent_id: 'agent_xyz'
})
}).then(r => r.json());
console.log(number);
// { phone_number: "+15141234567", status: "active", agent: "agent_xyz" }
List Your Numbers
curl -H "Authorization: Bearer sess_abc123..." \
"https://gositeme.com/api/voice-manage.php?action=list_numbers"
# {
# "numbers": [
# { "number": "+15141234567", "agent": "agent_xyz", "calls_today": 23, "status": "active" }
# ]
# }
Initiate Outbound Call
import requests
call = requests.post('https://gositeme.com/api/vapi-outbound.php',
headers={'Authorization': 'Bearer sess_abc123...'},
json={
'action': 'call',
'to': '+15145551234',
'from_number': '+15141234567',
'agent_id': 'agent_xyz',
'context': 'Follow up on support ticket #1234'
}
).json()
print(f"Call initiated: {call['call_id']}")
Voice Agent Configuration
Voice agents are AI personalities that interact with callers. Configure their personality, tools, knowledge base, and voice settings.
Create a Voice Agent
const agent = await fetch('https://gositeme.com/api/fleet.php', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer sess_abc123...'
},
body: JSON.stringify({
action: 'create_agent',
name: 'Receptionist',
personality: 'Professional, warm receptionist for a law firm. Always ask for caller name and purpose.',
tools: ['schedule_appointment', 'lookup_client', 'transfer_call', 'take_message'],
knowledge_base: 'kb_firm_info',
voice_enabled: true,
voice_engine: 'cartesia',
voice_config: {
voice_id: 'sonic-english-female-1',
speed: 1.0,
emotion: 'friendly'
},
call_settings: {
max_duration: 600,
silence_timeout: 10,
greeting: 'Thank you for calling Smith & Associates. How may I direct your call?'
}
})
}).then(r => r.json());
Agent Configuration Reference
| Field | Type | Description |
|---|---|---|
personality | string | System prompt defining agent behavior and tone |
tools | array | Tool names the agent can invoke during calls |
knowledge_base | string | Knowledge base ID for RAG-powered responses |
voice_engine | string | TTS engine: kokoro, orpheus, cartesia, elevenlabs |
voice_config.voice_id | string | Specific voice ID from chosen engine |
voice_config.speed | float | Speech speed multiplier (0.5–2.0, default: 1.0) |
call_settings.max_duration | integer | Max call duration in seconds (default: 600) |
call_settings.silence_timeout | integer | Hang up after N seconds of silence (default: 10) |
call_settings.greeting | string | First message spoken when call connects |
Supported Voice Engines
Alfred supports multiple text-to-speech engines. Choose based on your needs for latency, quality, and expressiveness.
Engine Comparison
| Engine | Latency | Quality | Languages | Plan Required |
|---|---|---|---|---|
kokoro | ~80ms | Good | EN, FR, ES, DE, JA | All plans |
orpheus | ~150ms | Very Good | EN, FR, ES | All plans |
cartesia | ~120ms | Excellent | EN, FR, ES, DE, PT, JA, KO | Professional+ |
elevenlabs | ~200ms | Premium | 29 languages | Enterprise |
Setting a Voice Engine
// Update agent voice engine
await fetch('https://gositeme.com/api/fleet.php', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer sess_abc123...'
},
body: JSON.stringify({
action: 'update_agent',
agent_id: 'agent_xyz',
voice_engine: 'cartesia',
voice_config: {
voice_id: 'sonic-english-female-1',
speed: 1.1,
emotion: 'professional'
}
})
});
import requests
requests.post('https://gositeme.com/api/fleet.php',
headers={'Authorization': 'Bearer sess_abc123...'},
json={
'action': 'update_agent',
'agent_id': 'agent_xyz',
'voice_engine': 'elevenlabs',
'voice_config': {
'voice_id': 'rachel',
'stability': 0.7,
'similarity_boost': 0.8
}
}
)
Web Voice Widget
Embed Alfred's voice widget in your website for browser-based voice interaction. The widget handles microphone access, speech-to-text, and real-time responses.
Quick Embed
<!-- Add to your page's <body> -->
<script src="https://gositeme.com/assets/js/alfred-voice-widget.js"></script>
<script>
AlfredVoice.init({
token: 'sess_abc123...', // Your session token
position: 'bottom-right', // bottom-right, bottom-left, top-right, top-left
theme: 'dark', // dark or light
accent: '#6c5ce7', // Widget accent color
greeting: 'Hi! How can I help you today?',
agent_id: 'agent_xyz', // Optional: use specific agent
voice_engine: 'kokoro', // TTS engine
tools: ['weather_lookup', 'dns_lookup', 'summarize_text'], // Allowed tools
onReady: () => console.log('Widget loaded'),
onMessage: (msg) => console.log('Alfred said:', msg.text),
onError: (err) => console.error('Voice error:', err)
});
</script>
Widget API
// Programmatic control
AlfredVoice.open(); // Open the widget
AlfredVoice.close(); // Close the widget
AlfredVoice.startListening(); // Start microphone
AlfredVoice.stopListening(); // Stop microphone
AlfredVoice.speak('Hello!'); // Text-to-speech
AlfredVoice.sendText('What is the weather in Montreal?'); // Send text input
AlfredVoice.destroy(); // Remove widget from DOM
// Event listeners
AlfredVoice.on('callStart', (data) => { /* call connected */ });
AlfredVoice.on('callEnd', (data) => { /* call ended */ });
AlfredVoice.on('transcript', (text) => { /* user speech recognized */ });
AlfredVoice.on('toolCall', (tool) => { /* tool being executed */ });
Custom Styling
/* Override widget styles */
.alfred-voice-widget {
--av-bg: #12121e;
--av-accent: #6c5ce7;
--av-text: #e8e8f0;
--av-radius: 16px;
--av-shadow: 0 8px 32px rgba(0,0,0,0.4);
}
.alfred-voice-btn {
width: 64px;
height: 64px;
border-radius: 50%;
}
.alfred-voice-panel {
width: 380px;
max-height: 500px;
}
Conference Room API
Conference rooms allow multiple AI agents and human participants to collaborate in real-time voice sessions. Useful for multi-agent workflows, team meetings with AI support, and complex problem-solving.
Create a Room
const room = await fetch('https://gositeme.com/api/fleet.php', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer sess_abc123...'
},
body: JSON.stringify({
action: 'create_room',
name: 'Strategy Session',
agents: ['agent_analyst', 'agent_researcher', 'agent_writer'],
max_participants: 10,
recording: true,
auto_transcribe: true
})
}).then(r => r.json());
console.log(room);
// {
// id: "room_abc",
// name: "Strategy Session",
// join_url: "https://gositeme.com/conference-room.php?id=room_abc",
// agents: 3,
// recording: true
// }
Join a Room
// Via web browser
window.location.href = room.join_url;
// Via API (add participant)
await fetch('https://gositeme.com/api/fleet.php', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer sess_abc123...'
},
body: JSON.stringify({
action: 'join_room',
room_id: 'room_abc',
participant: {
name: 'John Doe',
role: 'moderator'
}
})
});
Room Management
# Get room status
curl -H "Authorization: Bearer sess_abc123..." \
"https://gositeme.com/api/fleet.php?action=room_status&room_id=room_abc"
# End room and get transcript
curl -X POST https://gositeme.com/api/fleet.php \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sess_abc123..." \
-d '{"action":"end_room","room_id":"room_abc"}'
# Response includes transcript and recording URL
# {
# "transcript": "...",
# "recording_url": "https://gositeme.com/recordings/room_abc.mp3",
# "duration": 1847,
# "participants": 5,
# "tools_used": 12
# }
Someone from somewhere
just launched website.com
Just now