How to determine if a response audio has finished in Realtime API (WebRTC)

Tom_Kail · September 18, 2025, 1:36pm

Hi!
Using the realtime API via WebRTC, how can we now determine when the AI begins and finishes talking?
We used to use the output_audio_buffer events, but they appear to have been removed.

We can do audio volume analysis, obviously, but we need to account for natural pauses and it’s generally less accurate. Is there a “correct” way to do this with the new API?

sps · September 19, 2025, 4:35am

Hi @Tom_Kail

According to the API Reference, you should be getting response.output_audio.done event from the server when the output audio finishes streaming.

Tom_Kail · September 19, 2025, 10:12am

Unfortunately that event fires when the audio message has been finished streaming data, not when it’s finished playing it. You could use that to estimate - but it’s not nearly as useful as the old event.

mcfinley · September 19, 2025, 11:45am

Correct… the .done signals the end of audio data from gpt-realtime to the socket. You have to implement a playback FIFO queue that takes .delta chunks and queues them. Audio is only done when two things have happened: you have the .done even from the socket AND you have exhausted the playback queue. Be careful not to assume that either one of those things by itself means you’re done’ with playback! By the way, this mechanism is also really important for handling interruptions… The gpt-server may have sent you 10 seconds of audio and sent the “.done” event. So now that audio is playing but the user wants to interrupt… you have to zero-out your local queue of pending audio, not just assume realtime can handle it.

Tom_Kail · September 19, 2025, 11:57am

It looks like they’ve started sending the output_audio_buffer events again! They’re still removed from the docs though. So my problem is “fixed” but it’d be handy if someone from the team would weigh in.

juberti · September 19, 2025, 5:36pm

the output_audio_buffer.* events were never removed, but they somehow got dropped from the docs. will fix.

Michael_Mitchell · October 10, 2025, 7:01pm

whats the difference between output_audio_buffer.stopped and response.output_audio.done

Michael_Mitchell · October 10, 2025, 7:21pm

we arent seeing any audio_output_buffer.*

edit: on websocket
nor are we seeing response.audio.done
we are seeing response.output_audio.done

juberti · October 14, 2025, 12:47am

on websocket you control audio playout, so there is no output_audio_buffer.stopped event. That event is only sent for SIP and WebRTC.

Topic		Replies	Views
Which event to determine when the AI voice stops speaking? API api , realtime	11	1501	April 28, 2025
Interrupt realtime audio with text message - WebRTC API realtime	17	1808	June 10, 2025
Response.output_audio.delta does not ever get sent via webrtc or websocket Bugs realtime , api-realtime	12	431	January 20, 2026
Realtime API WebRTC AI finish speaking event API realtime	1	279	February 18, 2025
How to signal end of audio stream? API api-realtime-speech	2	206	August 6, 2025

How to determine if a response audio has finished in Realtime API (WebRTC)

Related topics