Which event to determine when the AI voice stops speaking?

I want the opposite of input_audio_buffer.speech_stopped which determines when the user stops talking. But I want to determine when AI stops talking

// this gets called when the user stops talking
if (response.type === 'input_audio_buffer.speech_stopped') {
     console.log('Event: input_audio_buffer.speech_stopped - User stopped speaking.');
}

From my research the docs say to use where type = response.done - but this only gets fired after the WS receives the message. The AI voice continues to speak several seconds after that event.

I want to take actions like forwarding a call or ending a call after the end phrase is said. Right now i just have a fixed seconds delay after the response.done event to fire off my function but this cuts the AI voice off if it speakers a longer sentence.

Is there any event Iā€™m missing that determins when the output buffer speech stops?

4 Likes

OpenAI cannot know this, because the output is generated faster than real-time, and thus the event needs to come from your playback device (e.g. speaker).

Depending on your use-case. You can take a look at the OpenAI - Twilio integration. They are using ā€œmarksā€ to solve this:

//This is a solution I use
async function speaking_monitoring() {
const silence_threshold = 0.3; // sec
const check_interval = 0.1; // sec
let silence_count = 0;
while (true) {
try {
peerConnection.getStats(null).then((stats) => {
stats.forEach((report) => {
// report.type === media-source // the user is speaking
if (report.type === ā€œinbound-rtpā€ && report.kind === ā€œaudioā€) {
if (report.audioLevel > 0.1) {
log(ā€œAI is speakingā€);
}
else{
silence_count++;
if(silence_count > silence_threshold / check_interval)
log(ā€œAI is NOT speakingā€);
}
}
});
});
await wait(check_interval * 1000);
} catch (error) {
console.error(ā€˜Error with speaking_monitoring:ā€™, error);
return;
}
}
}ā€™

This is now possible.
This is the response that I receive

{"type":"output_audio_buffer.audio_stopped","event_id":"event_123456","response_id":"resp_123456"}

I was trying to find the end of the conversation so this in itself is not sufficient for that purpose. It only determines when the AI has stopped speaking.

I use function calling based on my specific use case to determine whether all objectives of the conversation are complete and then end session based this event.

This is great news! Thank you for sharing. I knew there had to be something eventually!

@vnandan I donā€™t see that event in the documentation

@rob266 Isnā€™t this the event that you were asking for? response.audio.done

Returned when the model-generated audio is done. Also emitted when a Response is interrupted, incomplete, or cancelled.