Hi everyone,
We’re developing a call center system using the OpenAI Realtime API, but we’re running into an issue with detecting the speech_stopped
event in real time. The speech_started
event is triggered correctly, but there’s a delay with speech_stopped
— sometimes it takes up to 15 seconds to activate, although it does trigger immediately on some occasions.
We’re using Twilio for the voice interface, and the delay in detecting speech_stopped
impacts the flow since, right afterward, we execute response.function_call_arguments.done
to retrieve information from a RAG (Retrieval-Augmented Generation) system.
Has anyone encountered similar issues or found ways to optimize server_vad
for better responsiveness? Are there specific parameters that might help minimize this delay?
Any insights or parameter recommendations would be greatly appreciated!
Thanks in advance!