Which event to determine when the AI voice stops speaking?

rob266 · November 10, 2024, 5:33pm

I want the opposite of input_audio_buffer.speech_stopped which determines when the user stops talking. But I want to determine when AI stops talking

// this gets called when the user stops talking
if (response.type === 'input_audio_buffer.speech_stopped') {
     console.log('Event: input_audio_buffer.speech_stopped - User stopped speaking.');
}

From my research the docs say to use where type = response.done - but this only gets fired after the WS receives the message. The AI voice continues to speak several seconds after that event.

I want to take actions like forwarding a call or ending a call after the end phrase is said. Right now i just have a fixed seconds delay after the response.done event to fire off my function but this cuts the AI voice off if it speakers a longer sentence.

Is there any event I’m missing that determins when the output buffer speech stops?

developer28 · December 12, 2024, 8:55am

OpenAI cannot know this, because the output is generated faster than real-time, and thus the event needs to come from your playback device (e.g. speaker).

Depending on your use-case. You can take a look at the OpenAI - Twilio integration. They are using “marks” to solve this:

github.com

twilio-samples/speech-assistant-openai-realtime-api-python/blob/main/main.py

import os
import json
import base64
import asyncio
import websockets
from fastapi import FastAPI, WebSocket, Request
from fastapi.responses import HTMLResponse, JSONResponse
from fastapi.websockets import WebSocketDisconnect
from twilio.twiml.voice_response import VoiceResponse, Connect, Say, Stream
from dotenv import load_dotenv

load_dotenv()

# Configuration
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
PORT = int(os.getenv('PORT', 5050))
SYSTEM_MESSAGE = (
    "You are a helpful and bubbly AI assistant who loves to chat about "
    "anything the user is interested in and is prepared to offer them facts. "
    "You have a penchant for dad jokes, owl jokes, and rickrolling – subtly. "

This file has been truncated. show original

semchenkov.a.a · December 31, 2024, 12:54pm

//This is a solution I use
async function speaking_monitoring() {
const silence_threshold = 0.3; // sec
const check_interval = 0.1; // sec
let silence_count = 0;
while (true) {
try {
peerConnection.getStats(null).then((stats) => {
stats.forEach((report) => {
// report.type === media-source // the user is speaking
if (report.type === “inbound-rtp” && report.kind === “audio”) {
if (report.audioLevel > 0.1) {
log(“AI is speaking”);
}
else{
silence_count++;
if(silence_count > silence_threshold / check_interval)
log(“AI is NOT speaking”);
}
}
});
});
await wait(check_interval * 1000);
} catch (error) {
console.error(‘Error with speaking_monitoring:’, error);
return;
}
}
}’

vnandan · January 30, 2025, 2:06pm

This is now possible.
This is the response that I receive

{"type":"output_audio_buffer.audio_stopped","event_id":"event_123456","response_id":"resp_123456"}

I was trying to find the end of the conversation so this in itself is not sufficient for that purpose. It only determines when the AI has stopped speaking.

I use function calling based on my specific use case to determine whether all objectives of the conversation are complete and then end session based this event.

rob266 · January 30, 2025, 2:23pm

This is great news! Thank you for sharing. I knew there had to be something eventually!

sashirestela · January 30, 2025, 3:45pm

@vnandan I don’t see that event in the documentation

@rob266 Isn’t this the event that you were asking for? response.audio.done

Returned when the model-generated audio is done. Also emitted when a Response is interrupted, incomplete, or cancelled.

IterationEngine · March 21, 2025, 12:13pm

Edit: Nevermind about the below! It’s not in openAI’s documentation, but it works – except for me, at least, the event name has changed to output_audio_buffer.stopped .
That event fires exactly when we want though. Thanks for the tip @vnandan !!

Original message:

Agree @sashirestela . I don’t see anything about this output_audio_buffer event…

Sadly I don’t think response.audio.done works, because that’s also a server side event that seems to track only when the audio data has finished delivering – not when audio playback has finished.

@rob266 Were you able to confirm this actually works? I don’t think it does. Whatever this event is, it doesn’t seem to be part of OpenAI’s event structure or emanating from its API.

IterationEngine · March 21, 2025, 12:34pm

Edit: in case you missed the above, listening for event output_audio_buffer.stopped seems like a great solution

Thanks for this! Do you find this works well? It seems like it could be a little brittle to different conditions? I wish there were a simpler way…

vladislove · April 9, 2025, 12:23pm

they removed it? I don’t observe this event in the list of events openai sends to my client..

vladislove · April 9, 2025, 12:31pm

does it work for you in the current API version? I don’t observe such events.. did you change the config somehow to trigger them?

Sharan_sidhu · April 10, 2025, 11:15am

I am also seeking an event that indicates when the AI has finished speaking. In my situation, I am only receiving text responses. In Twilio a event mark is sent when the audio has completely played, but it is still sent slightly early while the AI is still speaking.

vladislove · April 28, 2025, 2:27pm

they recently updated the docs and it says that these output_buffer events are available only via the webrtc

Topic		Replies	Views
Interrupt realtime audio with text message - WebRTC API realtime	17	1236	June 10, 2025
How to signal end of audio stream? API api-realtime-speech	2	96	August 6, 2025
Realtime API WebRTC AI finish speaking event API realtime	1	202	February 18, 2025
Interruption not implemented out of the box in the Twilio Example API turn-control , realtime	17	1943	October 13, 2024
Need help being able to interrupt the Realtime API response API realtime	19	6157	March 27, 2025

Which event to determine when the AI voice stops speaking?

Related topics