I’m using the GPT Realtime API with VAD enabled to test speech input → text output.
Until yesterday everything worked fine, but now I’m seeing a problem with the events:
• After sending audio, I consistently receive input_audio_buffer.speech_started.
• However, input_audio_buffer.speech_stopped is no longer firing reliably.
• In some rare cases it does fire, but in most cases no further events (including speech_stopped) are delivered after speech_started.
It looks like this started happening suddenly without any code changes on my side.
I’d like to understand what I should check or adjust in this situation:
• Are there any recent changes to VAD behavior or supported models that could affect speech_stopped?
• Is there any known issue where speech_started is emitted, but subsequent VAD-related events (speech_stopped, transcription events, etc.) are suppressed?
• Are there recommended settings (e.g. turn_detection type or parameters) to make sure speech_stopped is emitted reliably?
• Is there a way to debug why the server thinks speech has not stopped (e.g. extra logging, thresholds, silence duration)?