Delayed speech_stopped Event in Call Center System Using OpenAI Realtime API with Twilio

Hi everyone,

We’re developing a call center system using the OpenAI Realtime API, but we’re running into an issue with detecting the speech_stopped event in real time. The speech_started event is triggered correctly, but there’s a delay with speech_stopped — sometimes it takes up to 15 seconds to activate, although it does trigger immediately on some occasions.

We’re using Twilio for the voice interface, and the delay in detecting speech_stopped impacts the flow since, right afterward, we execute response.function_call_arguments.done to retrieve information from a RAG (Retrieval-Augmented Generation) system.

Has anyone encountered similar issues or found ways to optimize server_vad for better responsiveness? Are there specific parameters that might help minimize this delay?

Any insights or parameter recommendations would be greatly appreciated!

Thanks in advance!

We wanted to follow up on our previous post about the issues we were experiencing with detecting the speech_stopped event in real time. After thorough investigation, we’ve determined that the problem wasn’t on our side but rather an issue with the connection between Twilio and our server.

Specifically, it seems the issue was tied to a number we had purchased in Chicago. For reasons that Twilio is still investigating, this number occasionally experienced substantial connection losses, which affected not only the detection of speech_stopped but also the overall speech understanding.

As a workaround, we created a new number, and everything is now functioning perfectly without any delays or missed events. We’ll share more details if Twilio provides further clarification on the underlying cause.