I want the opposite of input_audio_buffer.speech_stopped which determines when the user stops talking. But I want to determine when AI stops talking
// this gets called when the user stops talking
if (response.type === 'input_audio_buffer.speech_stopped') {
console.log('Event: input_audio_buffer.speech_stopped - User stopped speaking.');
}
From my research the docs say to use where type = response.done - but this only gets fired after the WS receives the message. The AI voice continues to speak several seconds after that event.
I want to take actions like forwarding a call or ending a call after the end phrase is said. Right now i just have a fixed seconds delay after the response.done event to fire off my function but this cuts the AI voice off if it speakers a longer sentence.
Is there any event Iām missing that determins when the output buffer speech stops?
OpenAI cannot know this, because the output is generated faster than real-time, and thus the event needs to come from your playback device (e.g. speaker).
Depending on your use-case. You can take a look at the OpenAI - Twilio integration. They are using āmarksā to solve this:
I was trying to find the end of the conversation so this in itself is not sufficient for that purpose. It only determines when the AI has stopped speaking.
I use function calling based on my specific use case to determine whether all objectives of the conversation are complete and then end session based this event.
Edit: Nevermind about the below! Itās not in openAIās documentation, but it works ā except for me, at least, the event name has changed to output_audio_buffer.stopped .
That event fires exactly when we want though. Thanks for the tip @vnandan !!
Original message:
Agree @sashirestela . I donāt see anything about this output_audio_buffer eventā¦
Sadly I donāt think response.audio.done works, because thatās also a server side event that seems to track only when the audio data has finished delivering ā not when audio playback has finished.
@rob266 Were you able to confirm this actually works? I donāt think it does. Whatever this event is, it doesnāt seem to be part of OpenAIās event structure or emanating from its API.
I am also seeking an event that indicates when the AI has finished speaking. In my situation, I am only receiving text responses. In Twilio a event mark is sent when the audio has completely played, but it is still sent slightly early while the AI is still speaking.