Interrupt realtime audio with text message - WebRTC

Hi everyone,

I’ve been working with the WebRTC template.

When I interrupt the model using my voice, the audio output from the server stops immediately and restarts once I’ve finished speaking. (as expected)

I want to replicate this behavior for text input: when sending a text message, I’d like the audio to stop immediately and then restart to respond to the text.

I’ve tried using response.cancel, conversation.item.truncate then conversation.item.create with input_text and response.create but the audio continues playing until the end of the transcript.

Has anyone successfully implemented this functionality?

Thanks!

9 Likes

I’m trying the same thing. I have not found a solution so far. I guess the problem is that the audio has already been received when you cancel the response and therefore it is still played from the audio output buffer.

1 Like

You could try converting the text to speech with a TTS engine and then sending that file as audio…

2 Likes

Hi OpenAI, this is a serious issue, it worked fine , it worked on the WebSocket issue correctly, please help

3 Likes

The best workaround I have found so far is to play a pre-recorded audio file over the input track that says “hold on.” This stops the assistant from speaking immediately. Then I cancel the voice response once it’s come through as an event and then I submit the text message. It’s inefficient but works as expected.

Ideally the output audio buffer should be cleared as soon as a text response is received (like a vocal interruption). Hopefully OpenAI agrees and fixes this soon…

Gemini Multi-Live API it supports this feature. After migrating to OpenAI Realtime, I’m still facing the this problem—sometimes the input_audio_buffer.clear event works to clear the audio buffer and interupt it, and other times it doesn’t. I’m not sure why this is happening

1 Like

Was it using client mode ( manually handling the input audio) or the server_vad mode?

Sorry for the late response. We’re using the ‘server_vad’ mode since voice is our primary modality, but we also want to accept text.

hi, how do you "cancel the voice response once it’s come through "?

Hello. did you find a way for the text input? I’m running into a similar issue but more focused on the text side. I’m using WebRTC and handling voice interruptions correctly (audio stops instantly), but the text transcript continues rendering even after interruption — sometimes showing way more than was actually spoken. Has anyone figured out how to sync or trim the text display to match the actual spoken audio before interruption? Any workaround or event that helps align them better?

Hello anybody found a solution for this yet?

Also using WebRTC here with server_vad mode. When the model is reading a log response, I want to be able to manually send a event/text/signal to server so that it could interrupt the currently read response and move to some new logic. (e.g. if the user submits a email collection form while model is still speaking, I want to stop the speaking and jump to a new tool handle_email_registration)

Hi. how do you stop audio instantly? i tried set volume to 0 and disable audio track. It works. But if I restore volume or enable audio track, the unfinishd audio from the current response continues again.

Hey! Yeah, I’m actually interrupting the AI’s speech output using VAD (Voice Activity Detection) — but not for detecting user silence. I use it to detect when I (the user) start speaking while the AI is still talking. Once that happens, I use that signal to instantly stop the AI’s audio.

So instead of just muting or disabling the track (which still lets the remaining audio queue continue if re-enabled), VAD lets me detect real-time user speech and fully cut off the AI’s current speech — not just pause or mute it. That way, no audio resumes unexpectedly.

This is critical for us—has anyone discovered a solution? We need a way to halt the speech by sending a text event over the data channel. Since our model only receives audio input and we don’t have permission to open the mic, the speech keeps playing and the user can’t stop it.

I am experiencing the same issue, in the template, there is cancelAssistantResponse function, however, that does not work at all by sending event type “conversation.item.truncate” or “response.cancel”, please help.