Voice Agent using Realtime API

Building Real-Time Voice Agents with VoiceAgentBuilder and Twilio

Install Agent Builder pip install agent-builder==0.1.1

The agent-builder package provides a streamlined way to create real-time voice agents powered by OpenAI’s Realtime API. This allows developers to integrate voice AI capabilities into their applications, handling voice interactions effectively.

Example: Setting Up a Voice Agent with Twilio

Below is an example implementation of a voice agent that:

  • Uses Twilio’s Media Streams to handle voice calls.
  • Streams real-time audio to OpenAI’s Realtime API.
  • Processes the response and sends it back to the caller.

FastAPI WebSocket Endpoints for Twilio Integration

import json
import logging
from typing import AsyncIterator

from fastapi import FastAPI, WebSocket, Request
from fastapi.responses import HTMLResponse
from starlette.websockets import WebSocketDisconnect
from twilio.twiml.voice_response import Connect, VoiceResponse

from agent_builder.builders.voice_agent_builder import VoiceAgentBuilder

logger = logging.getLogger(__name__)
app = FastAPI()

@app.api_route("/incoming-call", methods=["GET", "POST"])
async def handle_incoming_call(request: Request):
    """
    Handles incoming Twilio calls and returns TwiML to connect to the media stream.
    """
    response = VoiceResponse()
    host = request.url.hostname
    connect = Connect()
    connect.stream(url=f"wss://{host}/media-stream")
    response.append(connect)
    return HTMLResponse(content=str(response), media_type="application/xml")

@app.websocket("/media-stream")
async def handle_media_stream(websocket: WebSocket):
    """
    WebSocket endpoint to process audio streams from Twilio and send responses.
    """
    await websocket.accept()
    stream_sid = None

    async def input_audio_stream() -> AsyncIterator[str]:
        """
        Reads incoming audio from Twilio.
        """
        nonlocal stream_sid
        yield json.dumps({
            "type": "response.create",
            "response": {
                "modalities": ["text", "audio"],
                "voice": "alloy",
                "output_audio_format": "g711_ulaw"
            }
        })

        async for message in websocket.iter_text():
            data = json.loads(message)
            event = data.get("event")

            if event == "start":
                stream_sid = data["start"]["streamSid"]
                logger.info(f"Stream started: {stream_sid}")
            elif event == "media":
                yield json.dumps({
                    "type": "input_audio_buffer.append",
                    "audio": data["media"]["payload"]
                })
            elif event == "stop":
                logger.info("Stream stopped")
                break

    async def handle_output_event(event_str: str):
        """
        Processes responses from OpenAI and sends them to Twilio.
        """
        event = json.loads(event_str)
        if event.get("type") == "response.audio.delta" and "delta" in event:
            if stream_sid:
                await websocket.send_text(json.dumps({
                    "event": "media",
                    "streamSid": stream_sid,
                    "media": {"payload": event["delta"]}
                }))
    
    voice_agent = VoiceAgentBuilder().set_voice("alloy").set_input_audio_format("g711_ulaw").build()
    await voice_agent.ainvoke(input_audio_stream(), handle_output_event)

Example: Creating a Weather Tool for Use in Voice Agents

We can extend the capabilities of our voice agent by adding tools. Below is an example of a tool that fetches the current weather.

from pydantic import BaseModel, Field
from agent_builder.builders.tool_builder import ToolBuilder
import requests

class WeatherRequest(BaseModel):
    city: str = Field(description="City name to get the weather for.")

async def get_current_weather(city: str) -> str:
    """
    Fetches the current weather for a given city.
    """
    try:
        api_url = f"https://api.weatherapi.com/v1/current.json?key=YOUR_API_KEY&q={city}"
        response = requests.get(api_url).json()
        return f"Current weather in {city}: {response['current']['temp_c']}°C, {response['current']['condition']['text']}."
    except Exception as e:
        return f"Failed to fetch weather: {str(e)}"


def create_weather_tool():
    """
    Builds and returns a tool for fetching weather information.
    """
    tool_builder = ToolBuilder()
    tool_builder.set_name("GetWeather")
    tool_builder.set_function(get_current_weather)
    tool_builder.set_coroutine(get_current_weather)
    tool_builder.set_description("Fetches the current weather for a given city.")
    tool_builder.set_schema(WeatherRequest)
    return tool_builder.build()

Adding the Tool to the Voice Agent

weather_tool = create_weather_tool()
voice_agent = VoiceAgentBuilder().set_voice("alloy").set_input_audio_format("g711_ulaw").set_tools([weather_tool]).build()

Conclusion

This example showcases how you can:

  • Use VoiceAgentBuilder to create real-time voice agents.
  • Connect with Twilio for live calls.
  • Extend capabilities with tools like a weather fetching tool.

Would love to hear how you’re using this! :rocket:

1 Like