We’re experiencing an intermittent but reproducible issue with the OpenAI Realtime API where the session freezes at a random point during a conversation. The assistant responds correctly several times, but at some point stops reacting to the user’s audio entirely.
Environment
-
OpenAI Realtime API via WebSocket
-
Browser-based app using AudioWorklet for microphone capture
-
Production environment
Steps to reproduce
-
Start a Realtime API session
-
Have a normal back-and-forth conversation — the assistant responds correctly multiple times
-
At a random point, attempt to speak after the assistant finishes a response
-
The session freezes — more likely to happen when there is external noise present (e.g. keyboard typing)
Expected behavior The assistant consistently detects speech and fires input_audio_buffer.speech_started, then responds.
Actual behavior At a random point during the conversation, the WebSocket goes completely silent after an assistant response. No further server events are received — no speech_started, no speech_stopped, nothing. The AudioWorklet continues processing and sending audio chunks, but the server never acknowledges them. The session appears alive on the client side but is effectively frozen. The only recovery is a full session restart.
What we observe in the logs After the freeze, every single audio chunk coming from the AudioWorklet is flagged as silent, even when the user is speaking loudly. This continues indefinitely with no server-side reaction.
[AudioWorklet] First audio chunk is silent (max: 41)
[AudioWorklet] First audio chunk is silent (max: 78)
[AudioWorklet] First audio chunk is silent (max: 96)
... (hundreds of consecutive entries)
What we have ruled out
-
The WebSocket connection does not drop or throw an error
-
The issue is not related to audio volume — speaking louder does not recover the session
-
It is not consistently tied to switching applications, though that increases the likelihood
-
The assistant does respond correctly multiple times before the freeze occurs
Production-specific code that may be relevant
We noticed this issue only occurs in production. Our production build has three mechanisms that do not exist in our development build:
1. Client-side VAD gate — audio chunks with RMS energy below a threshold are replaced with zeros before being sent to OpenAI:
ts
const isSilent = this.lastMicrophoneEnergy < this.clientVADThreshold // 0.03
const dataToSend = isSilent ? new Int16Array(audioData.length) : audioData
2. Higher interruption energy threshold — production requires significantly more energy to consider user speech as an interruption while the assistant is speaking:
ts
// Production
private readonly INTERRUPTION_ENERGY_THRESHOLD = 0.1
// Development
private readonly INTERRUPTION_ENERGY_THRESHOLD = 0.02
3. Stuck VAD watchdog — if the server remains in speech_started state for 30 seconds without transitioning, production sends an input_audio_buffer.clear to reset it:
ts
this.stuckVADTimeout = setTimeout(() => {
this.gatewayClient.sendOpenAIMessage({ type: 'input_audio_buffer.clear' })
}, 30_000)
Our theory is that external noise (e.g. keyboard) triggers speech_started on the server, the client VAD gate then starts sending zeros, and this combination leaves the server VAD in an inconsistent state from which it never recovers — resulting in the frozen session.
Questions for the community
-
Has anyone experienced the WebSocket entering a state where it appears connected but stops receiving server events?
-
Is there a known issue with the server VAD getting stuck after receiving a mix of real audio and silence (zeros)?
-
Are there recommended patterns for detecting and recovering from this frozen state without doing a full session restart?
-
We are considering migrating from WebSocket to WebRTC for the Realtime API. Has anyone made this migration and found it more stable for browser-based use cases? Did it solve similar freezing or silent-session issues? Are there trade-offs or known limitations we should be aware of before committing to that approach?