I want the opposite of input_audio_buffer.speech_stopped which determines when the user stops talking. But I want to determine when AI stops talking
// this gets called when the user stops talking
if (response.type === 'input_audio_buffer.speech_stopped') {
console.log('Event: input_audio_buffer.speech_stopped - User stopped speaking.');
}
From my research the docs say to use where type = response.done - but this only gets fired after the WS receives the message. The AI voice continues to speak several seconds after that event.
I want to take actions like forwarding a call or ending a call after the end phrase is said. Right now i just have a fixed seconds delay after the response.done event to fire off my function but this cuts the AI voice off if it speakers a longer sentence.
Is there any event I’m missing that determins when the output buffer speech stops?
OpenAI cannot know this, because the output is generated faster than real-time, and thus the event needs to come from your playback device (e.g. speaker).
Depending on your use-case. You can take a look at the OpenAI - Twilio integration. They are using “marks” to solve this: