[Realtime API] Audio is randomly cutting off at the end

jordan-b · November 12, 2024, 2:34pm

Same happening for us, using a Ruby client we wrote ourselves. Can provide logs/details if it would be helpful.

english123 · November 13, 2024, 5:12am

Thank you! I am facing the same issue. It cuts off the end, making it totally unusable. I just reverted back to 4o mini and Whisper because it actually says the full sentence.

alex105 · November 16, 2024, 6:05am

We’re still experiencing this too, and it’s really a big problem - there are a number of real production use cases we’d like to be using the Realtime API for, but the cutting off midsentence behavior is just not acceptable for production use.

Which is too bad, because this API is incredible next generation tech and not being able to use it for real customers is a shame!

I’d greatly appreciate a fix, and even an indication of how soon a fix might be realistic would be very helpful.

ashwinnayak · November 16, 2024, 6:12pm

Completely agree. I’ve resorted to using the realtime api constrained to text only, and then add a final TTS step. Adds latency of course but a lot more stable.

anon10827405 · November 16, 2024, 6:19pm

If it helps, I use AVM almost daily using the ChatGPT app and it’s always cutting off at the end as well, so I doubt there’s any solution besides wait.

alex105 · November 16, 2024, 10:37pm

Interesting, Ashwin, do you mean that you grab the transcript of what the Realtime API is outputting and then run it through TTS on the side? Thanks for the idea.

robertgr · November 17, 2024, 4:27pm

Any findings so far? I am reading below about people using text-only realtime + TTS to work around this…

Typical conversation for us this days:

“Ey, can you tell me what is the energy consumption right now in the building?”
<We see in the log the LLM doing all the magic in the background with its tools (retrieves real time information from field devices, uses a python env to make some basic math, crazy low latency… and then answers: > " I have retrieved the consumption of xyz devices and calculated that the consumption of the building right now is …" and then the audio message cuts off.

And we are paying for that to be clear…

ashwinnayak · November 17, 2024, 7:57pm

I disable audio output (https://platform.openai.com/docs/api-reference/realtime-client-events/session/update) so its just responding with a text completion, not technically a transcript. And then yes send that to TTS.

User audio → ai text → ai speech instead of user audio → ai audio, but still faster than the traditional user audio → user text → ai text → ai audio.

alex105 · November 17, 2024, 11:14pm

Oh awesome, thanks very much for the tip. Hopefully this kind of hack won’t be necessary for much longer but appreciate the pointer for how to try getting this thing into production asap…

ketanavaamo · November 20, 2024, 2:05pm

@ashwinnayak How much delay you notice in text to speech and speech to text approach against original speech to speech approach

ashwinnayak · November 21, 2024, 7:22pm

Really just depends on the TTS implementation. The added latency = time to first audio chunk. You can also implement streaming text so that you are sending text chunks to TTS as they are received from the realtime API.

robertgr · November 22, 2024, 10:20am

It is sad though that for such an expensive API we have to work around this and use TTS. Regardless of the latency you might achieve with this approach, it defeats the whole purpose of having gpt4o itself provide the right intonation and speed to its speech…

robertb · November 22, 2024, 11:08am

I’m not sure if any of this is relevant to your issue, but here are my observations:

I’ve been experimenting with the Realtime API for a rather unique use case: an Unreal Engine Realtime API plugin for talking 3D avatars. In our initial tests, the Realtime voice frequently cut off. After some investigation, I discovered the issue stemmed from the Voice Activity Detection (VAD) mistaking its own speech for mine. This was likely because the Unreal AudioCapture setup probably lacks active microphone cancellation. I resolved the problem by adjusting the VAD sensitivity to 0.8 and using a narrow microphone. This configuration worked reliably in a point-of-contact setup.

In our second experiment, we developed 2D talking avatars in browsers. For this, we used the audio backend from OpenAI’s Console open-source demo, which is stable on the same desktop computer - likely thanks to better microphone cancellation in the browser (although I’m not entirely sure). That said, there were still very occasional instances where the Realtime voice cut off, typically when desktop speakers were loud enough to cause interference.

dnna · November 22, 2024, 11:23am

This might be one of the causes, but it’s not only one because in our case we use push-to-talk and have disabled VAD, but it still cuts off occasionally. Often it’s the content filter mischaracterizing something and cutting off, but there have been occasions where there is no error and audio simply cuts towards the end.

robertgr · November 22, 2024, 11:24am

negative. it happens using headphones too. and like many others say its always the very last few words of a message. We can easily notice, since in our case, those few last words most of the time contain critical information. e.g. a temperature value, etc.

robertgr · November 22, 2024, 11:27am

I am convinced its a matter of audio not filling enough of the last streaming packet in some buffer in their pipeline and then the message been dropped. Thats why it happens sometimes and not others. Depends on message size and how it fits in packets

robertb · November 22, 2024, 11:39am

This might well happen “occasionally” in our case also. Of course the Realtime API is in “Preview” but still works well enough that we’re pushing a new project into production next couple of weeks, with a “preview” disclaimer on the voice, and our interface is hybrid voice/text chat so the user can always move to text chat if voice fails for some reason.

robertgr · November 22, 2024, 11:55am

agree robertb. depending on the usecase this might well be production ready. for us its not really possible beyond PoC and demonstrations.

We had many times situations where the AI answer is:

“I calculated the average current consumption of xzy equipment devices, and it is …” and values are lost. Sure, we get transcription and could show it but not the idea.

jordan-b · November 22, 2024, 6:38pm

I’ve had this happen to me while using the ChatGPT iOS app, so it seems to affect OpenAI themselves as well. I had mute enabled at the time so it wasn’t due to noise on my end or anything. The transcript had the fully reply but the voice got cut off.

lucasvan · November 26, 2024, 8:44am

I am also experiencing this issue in my Python based Twilio / Realtime API app building out the bootstrap code you can find online (can’t share a link here in this forum for some reason…)

From my experiments, it’s definitely on Realtime API side, not on Twilio. But just wanted to verify in the group: are people experiencing this with a pure Realtime API app without Twilio?

@robertgr :

“I calculated the average current consumption of xzy equipment devices, and it is …” and values are lost. Sure, we get transcription and could show it but not the idea.

This is kind of interesting, since I have a similar line which is exactly cut off at the same point:

"Hello, are you calling regarding “…”? And then the “…” is dropped. In the transcript nothing is dropped.

Topic		Replies	Views
Realtime API extremely expensive Feedback realtime	66	6534	December 4, 2024
4o and 4 API output has typo/missing words Bugs gpt-4	55	722	July 19, 2024
Is anyone experiencing WebSocket Realtime Error on Chrome browser? API	77	982	January 27, 2025
Realtime transcription issue API	21	933	April 9, 2025
Need help being able to interrupt the Realtime API response API realtime	19	4852	March 27, 2025

[Realtime API] Audio is randomly cutting off at the end

Related topics