I want the opposite of input_audio_buffer.speech_stopped which determines when the user stops talking. But I want to determine when AI stops talking
// this gets called when the user stops talking
if (response.type === 'input_audio_buffer.speech_stopped') {
console.log('Event: input_audio_buffer.speech_stopped - User stopped speaking.');
}
From my research the docs say to use where type = response.done - but this only gets fired after the WS receives the message. The AI voice continues to speak several seconds after that event.
I want to take actions like forwarding a call or ending a call after the end phrase is said. Right now i just have a fixed seconds delay after the response.done event to fire off my function but this cuts the AI voice off if it speakers a longer sentence.
Is there any event Iām missing that determins when the output buffer speech stops?
OpenAI cannot know this, because the output is generated faster than real-time, and thus the event needs to come from your playback device (e.g. speaker).
Depending on your use-case. You can take a look at the OpenAI - Twilio integration. They are using āmarksā to solve this:
I was trying to find the end of the conversation so this in itself is not sufficient for that purpose. It only determines when the AI has stopped speaking.
I use function calling based on my specific use case to determine whether all objectives of the conversation are complete and then end session based this event.