Which event to determine when the AI voice stops speaking?

I want the opposite of input_audio_buffer.speech_stopped which determines when the user stops talking. But I want to determine when AI stops talking

// this gets called when the user stops talking
if (response.type === 'input_audio_buffer.speech_stopped') {
     console.log('Event: input_audio_buffer.speech_stopped - User stopped speaking.');
}

From my research the docs say to use where type = response.done - but this only gets fired after the WS receives the message. The AI voice continues to speak several seconds after that event.

I want to take actions like forwarding a call or ending a call after the end phrase is said. Right now i just have a fixed seconds delay after the response.done event to fire off my function but this cuts the AI voice off if it speakers a longer sentence.

Is there any event I’m missing that determins when the output buffer speech stops?

5 Likes

OpenAI cannot know this, because the output is generated faster than real-time, and thus the event needs to come from your playback device (e.g. speaker).

Depending on your use-case. You can take a look at the OpenAI - Twilio integration. They are using ā€œmarksā€ to solve this:

//This is a solution I use
async function speaking_monitoring() {
const silence_threshold = 0.3; // sec
const check_interval = 0.1; // sec
let silence_count = 0;
while (true) {
try {
peerConnection.getStats(null).then((stats) => {
stats.forEach((report) => {
// report.type === media-source // the user is speaking
if (report.type === ā€œinbound-rtpā€ && report.kind === ā€œaudioā€) {
if (report.audioLevel > 0.1) {
log(ā€œAI is speakingā€);
}
else{
silence_count++;
if(silence_count > silence_threshold / check_interval)
log(ā€œAI is NOT speakingā€);
}
}
});
});
await wait(check_interval * 1000);
} catch (error) {
console.error(ā€˜Error with speaking_monitoring:’, error);
return;
}
}
}’

1 Like

This is now possible.
This is the response that I receive

{"type":"output_audio_buffer.audio_stopped","event_id":"event_123456","response_id":"resp_123456"}

I was trying to find the end of the conversation so this in itself is not sufficient for that purpose. It only determines when the AI has stopped speaking.

I use function calling based on my specific use case to determine whether all objectives of the conversation are complete and then end session based this event.

1 Like

This is great news! Thank you for sharing. I knew there had to be something eventually!

@vnandan I don’t see that event in the documentation

@rob266 Isn’t this the event that you were asking for? response.audio.done

Returned when the model-generated audio is done. Also emitted when a Response is interrupted, incomplete, or cancelled.

1 Like

Edit: Nevermind about the below! It’s not in openAI’s documentation, but it works – except for me, at least, the event name has changed to output_audio_buffer.stopped .
That event fires exactly when we want though. Thanks for the tip @vnandan !!

Original message:

Agree @sashirestela . I don’t see anything about this output_audio_buffer event…

Sadly I don’t think response.audio.done works, because that’s also a server side event that seems to track only when the audio data has finished delivering – not when audio playback has finished.

@rob266 Were you able to confirm this actually works? I don’t think it does. Whatever this event is, it doesn’t seem to be part of OpenAI’s event structure or emanating from its API.

Edit: in case you missed the above, listening for event output_audio_buffer.stopped seems like a great solution :slight_smile:

Thanks for this! Do you find this works well? It seems like it could be a little brittle to different conditions? I wish there were a simpler way…

they removed it? I don’t observe this event in the list of events openai sends to my client..

does it work for you in the current API version? I don’t observe such events.. did you change the config somehow to trigger them?

I am also seeking an event that indicates when the AI has finished speaking. In my situation, I am only receiving text responses. In Twilio a event mark is sent when the audio has completely played, but it is still sent slightly early while the AI is still speaking.

they recently updated the docs and it says that these output_buffer events are available only via the webrtc

1 Like