Interrupt realtime audio with text message - WebRTC

dordonne.thomas · December 26, 2024, 11:22am

Hi everyone,

I’ve been working with the WebRTC template.

When I interrupt the model using my voice, the audio output from the server stops immediately and restarts once I’ve finished speaking. (as expected)

I want to replicate this behavior for text input: when sending a text message, I’d like the audio to stop immediately and then restart to respond to the text.

I’ve tried using response.cancel, conversation.item.truncate then conversation.item.create with input_text and response.create but the audio continues playing until the end of the transcript.

Has anyone successfully implemented this functionality?

Thanks!

techno_optimist · January 8, 2025, 7:59pm

I’m trying the same thing. I have not found a solution so far. I guess the problem is that the audio has already been received when you cancel the response and therefore it is still played from the audio output buffer.

Foxalabs · January 8, 2025, 11:29pm

You could try converting the text to speech with a TTS engine and then sending that file as audio…

uri-tinytap · January 9, 2025, 4:23pm

Hi OpenAI, this is a serious issue, it worked fine , it worked on the WebSocket issue correctly, please help

rockettnet · February 7, 2025, 9:43pm

The best workaround I have found so far is to play a pre-recorded audio file over the input track that says “hold on.” This stops the assistant from speaking immediately. Then I cancel the voice response once it’s come through as an event and then I submit the text message. It’s inefficient but works as expected.

Ideally the output audio buffer should be cleared as soon as a text response is received (like a vocal interruption). Hopefully OpenAI agrees and fixes this soon…

ariel042cohen · February 7, 2025, 10:10pm

Gemini Multi-Live API it supports this feature. After migrating to OpenAI Realtime, I’m still facing the this problem—sometimes the input_audio_buffer.clear event works to clear the audio buffer and interupt it, and other times it doesn’t. I’m not sure why this is happening

shivam42ai · March 20, 2025, 8:01am

Was it using client mode ( manually handling the input audio) or the server_vad mode?

rockettnet · March 28, 2025, 9:36pm

Sorry for the late response. We’re using the ‘server_vad’ mode since voice is our primary modality, but we also want to accept text.

boss666 · April 10, 2025, 12:21pm

hi, how do you "cancel the voice response once it’s come through "?

kivseddy · April 10, 2025, 1:23pm

Hello. did you find a way for the text input? I’m running into a similar issue but more focused on the text side. I’m using WebRTC and handling voice interruptions correctly (audio stops instantly), but the text transcript continues rendering even after interruption — sometimes showing way more than was actually spoken. Has anyone figured out how to sync or trim the text display to match the actual spoken audio before interruption? Any workaround or event that helps align them better?

grantlicomm100 · April 10, 2025, 8:29pm

Hello anybody found a solution for this yet?

Also using WebRTC here with server_vad mode. When the model is reading a log response, I want to be able to manually send a event/text/signal to server so that it could interrupt the currently read response and move to some new logic. (e.g. if the user submits a email collection form while model is still speaking, I want to stop the speaking and jump to a new tool handle_email_registration)

boss666 · April 11, 2025, 3:44am

Hi. how do you stop audio instantly? i tried set volume to 0 and disable audio track. It works. But if I restore volume or enable audio track, the unfinishd audio from the current response continues again.

kivseddy · April 13, 2025, 1:17pm

Hey! Yeah, I’m actually interrupting the AI’s speech output using VAD (Voice Activity Detection) — but not for detecting user silence. I use it to detect when I (the user) start speaking while the AI is still talking. Once that happens, I use that signal to instantly stop the AI’s audio.

So instead of just muting or disabling the track (which still lets the remaining audio queue continue if re-enabled), VAD lets me detect real-time user speech and fully cut off the AI’s current speech — not just pause or mute it. That way, no audio resumes unexpectedly.

HatemKhalil · April 19, 2025, 10:57pm

This is critical for us—has anyone discovered a solution? We need a way to halt the speech by sending a text event over the data channel. Since our model only receives audio input and we don’t have permission to open the mic, the speech keeps playing and the user can’t stop it.

leyo · April 22, 2025, 12:13am

I am experiencing the same issue, in the template, there is cancelAssistantResponse function, however, that does not work at all by sending event type “conversation.item.truncate” or “response.cancel”, please help.

Tom_Kail · May 1, 2025, 9:28am

Confirm we’re seeing this too. The openai-realtime-agents demo has the same issue - the cancelAssistantSpeech function is missing the “output_audio_buffer.clear” call that is required to get the AI to actually stop talking (near) immediately, and even when that’s added, the AI will sometimes resume speaking nearer the end of its message after.

Looks like it’s logged here, but nobody has gotten back on it.

github.com/openai/openai-realtime-agents

AI Audio Does Not Truncate on Text Message but Stops on VAD Input

opened 04:10AM - 13 Feb 25 UTC

nischay-chauhan

when the AI is speaking a long response, sending a text message does not interru…pt the ongoing speech. However, if I interrupt using voice input (via VAD), the audio stops correctly, and the new response begins. This inconsistency makes the interaction feel unresponsive when using text input. Expected Behavior The AI should stop speaking immediately when a new text message is sent, just like it does when interrupted via voice input (VAD). The new message should be processed, and its corresponding speech should begin without waiting for the previous speech to finish. Actual Behavior The AI continues speaking the previous response even after a new text message is sent. The new message is not spoken until the previous speech completes. However, if I interrupt using voice input (VAD), the audio stops as expected, and the AI responds immediately. i have tried it on browser like brave , chrome , firefox. soo does anybody have idea on how to resolve it ?

lenduya · June 10, 2025, 3:23am

How do you send an audio file with WebRTC?

rockettnet · June 10, 2025, 5:25pm

UPDATE: I’ve discovered that using “output_audio_buffer.clear” works to clear the output stream if the agent is talking (but you have to cancel the active response first if there is one). Not sure if this was fixed recently or if there is simply better documentation available, but my workaround is no longer needed.

Topic		Replies	Views
Need help being able to interrupt the Realtime API response API realtime	19	5430	March 27, 2025
Interruption not implemented out of the box in the Twilio Example API turn-control , realtime	17	1786	October 13, 2024
Unable to interrupt and stop model speaking API	5	324	February 24, 2025
Clear webRTC audio buffer,when using VAD with createResponse turned off flag API api	2	160	April 10, 2025
Which event to determine when the AI voice stops speaking? API api , realtime	11	905	April 28, 2025

Interrupt realtime audio with text message - WebRTC

Related topics