[Realtime API] Audio is randomly cutting off at the end

@gokulraya do you please have an update on this matter?

It seems to effect practically everyone…

4 Likes

I was experiencing the same problem while preparing the demo code for my simple-openai Java library. I was able to fix it, after a brief analysis:

  1. After speaking, my code sends a response.create request.
  2. The AI ​ ​sends audio fragments via several response.audio.delta events and sends a response.audio.done event when it finishes.
  3. My code handles those two types of events: it plays each audio delta and stops the speakers when a done audio arrives.
  4. Because handling audio deltas takes more time, it might be necessary to give a little delay before stopping the speakers. This solved the problem.

Here you can see the code that shows how it works:

You should be able to extrapolate this fix to your own language/framework.

2 Likes

I will revisit this but if you check in the thread we already tried some of us this. It seems to still not work. It really look like the audio never arrives so no fix possible on client side.

Here: [Realtime API] Audio is randomly cutting off at the end - #13 by robertgr

but as I wrote later, that didn’t quite fix it. I was just coincidentally not reproducing. Did you try enough times to be sure it is not happening in your case? this is anyway happening?

@robertgr The problem occurred consistently and after the fix it did not happen consistently anymore.

It is worth mentioning that my code is at the backend level calling the Realtime API.

2 Likes

This is incredibly frustrating. I hope the OpenAI developers are aware of this issue. I’ve been working on something for months, and now I’m just waiting to deploy it to production once the audio interruptions are resolved.

1 Like

We are also facing the same issue. If someone finds the solution please provide the solution.

1 Like

I’m also having the exact same problem using NodeJS + Twilio + Server VAD.

Has anyone solved this by switching to manual VAD?

I’m experiencing the same issue with the last bit of audio often being cut off on AzureOpenAI -

  • using my own custom client, using the WS API directly
  • using the auto VAD
  • have not tried manual VAD
  • it happens both with and without the content filter on
  • it appears to be random and I cannot reproduce it
1 Like

The realtime API is in Beta and is not suitable for a production environment.

1 Like

That’s understandable, but admission that a bug exists would allow developers to continue while waiting for the production release or a fix. Otherwise many are waisting a lot of time chasing a know issue.

5 Likes

haven’t experience this at all, I’ve only used webrtc

2 Likes

This issue is still present in the latest version of the API. Has anyone managed to find a workaround?

I don’t believe there will be a workaround. I wrote the entire client in Java myself implementing the same flow as was recommended above and I still see this from time to time (maybe every 3rd conversation for me)

1 Like

I have the same problem myself. I’m invoking Azure OpenAI via my Python backend and then send the audio bytes to a JS frontend. Thought I was doing something wrong when playing the audio in the frontend, as I’ve almost no experience with JS (yet!). But now I can see this is happening to everyone.

Using what interface? WebSockets or WebRTC?
I see @anon37218972 says it does not happen at all using WebRTC.
WebRTC is better situated for Realtime voice/etc than WebSockets.

They even say so in:

other than saying that you need to handle more, do they explicitly say that WebRTC is better? My code is a twilio → java server-server app where they recommend websockets.

1 Like

That’s true, however they do “support” the WebSockets api and there are situations where not having to have a separate front end is desirable for integration into existing code bases. Also, I have two version of my code, WebSockets and WebRTC. The WebRTC version does still seem to cut off responses, but only infrequently, versus the WebSocket version which does so every few responses.

Mine only does this maybe once every 4-5 conversations. I would say if you are having the issue on webrtc sometimes, it is probably as frequently as my implementation on websockets as well. It doesn’t seem like it is on any of our sides’, as the transcript is fully completing and everyone is having this issue…

2 Likes

I’m using WebSockets. I understand that using WebRTC might be more suitable for a client-side application. However, I’ve also tested Google’s multimodal live API which uses WebSockets and I don’t have any problems with random audio cut offs.

1 Like