Issues with realtime turn-taking

I’ve observed two issues with realtime turn-taking that I’d like to report.

  1. The interrupt_response configuration doesn’t seem to have any effect. I’ve tried setting it to False, but the LLM will still stop talking if I start talking over it. Am I misunderstanding the purpose of this configuration? I expected the LLM to finish speaking its whole response with this config set to False. I’ve observed this using both server and semantic VAD.

  2. Semantic VAD often doesn’t pick up on short utterances by the caller, like “yeah” or “sure”. I noticed this problem because I have a tool call that sends an SMS that I want the LLM to make sure the caller explicitly opts into. So the LLM says something like, “Do you want me to send that text to your phone?” If the caller just says “yeah”, no input is registered. There’s no input_audio_buffer.speech_started event, the caller’s turn isn’t registered, so then there’s silence on the call until the user says something more obvious like, “Yes, send the text.” Server VAD does a good job of picking up on these short utterances so I’ve switched back to using it for now. When I was using semantic VAD I had eagerness set to high.

We’re observing the same behavior and want to confirm whether this is a bug or a misunderstanding of interrupt_response.

We’d like the AI to finish its spoken response even if the caller starts speaking, so we’ve set it to false.

However, this parameter seems to do nothing in practice.

As soon as input_audio_buffer.speech_started comes in, output_audio_buffer.cleared happens, which immediately cuts off the assistant’s audio mid-sentence.

This happens regardless of whether interrupt_response is set to true or false.

Curious if anyone has found a workaround or seen different behavior.

Regarding the first point, I have the same problem .. the way I circumvented it was by setting turn_detection to null for the initial responses where i don’t want the model to be interrupted and then sendind a session.update to set the entire turn_detection object with my vad settings

@Sean-Der Do you know what’s going on here with the interrupt_response configuration?

I agree with @achill that being able to make certain responses non-interruptable would be really useful. A welcome message is a perfect example of one that we wouldn’t want being cut off. Changing the turn detection settings just to accomplish this is a bit cumbersome though, so would it be possible to allow new field on the response.create client event to make just the single resulting response un-interruptable?