Hi everyone,
I’m building a voice AI assistant using:
- Twilio Programmable Voice
- FastAPI WebSocket server
- OpenAI’s new gpt-4o-realtime-preview-2024-10-01 via WebSocket
The goal is to create a real-time conversational AI agent that receives Twilio’s audio stream and responds using OpenAI’s Realtime API.
What I’ve done:
Twilio sends audio via media-stream to my FastAPI WebSocket endpoint
The server bridges messages between Twilio and OpenAI’s WebSocket
I send a session.update as expected
I base64 encode/decode the audio as needed
Everything connects cleanly, no auth errors
The problem:
The OpenAI WebSocket closes almost immediately with this log:
Error processing message: received 1000 (OK); then sent 1000 (OK)
There’s no actual generation or processing before the connection closes. On the OpenAI usage dashboard, I see no usage, so the request clearly didn’t trigger inference.
async with websockets.connect(
“wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01”,
additional_headers={
“Authorization”: f"Bearer {OPENAI_API_KEY}",
“OpenAI-Beta”: “realtime=v1”
Is there a minimal required sequence of events to keep the session alive and initiate streaming?
Anyone have a working Twilio → OpenAI realtime setup I could compare with?
Thanks a lot in advance