[Realtime API] Audio is randomly cutting off at the end

Sometimes (about each 5th request) audio is cutting off at the end, but in console we receive whole transcription in console.

Language [Kazakh, Russian]

6 Likes

I also experience this, in English. Not sure about the frequency (e.g. every 5th request), but it definitely happens every once in a while. Itā€™s always towards the end of sentences. Are you sending any response.cancel requests? My suspicion on my end was that it is a bug in my application code thatā€™s calling response.cancel incorrectly/prematurely, before the playback can finish.

2 Likes

Iā€™m not doing any response.cancel events and experiencing this too. Canā€™t confidently say anything about the frequency but itā€™s VERY frequent

1 Like

Happens for me too. Voice is cutting off randomly towards end of the last sentence.

1 Like

Did anyone find a solution, using the Twilio API and Node JS and its being cut off basically every other response. Very frustrating.

1 Like

This bug is related to the content filter.
(edit: it seems to be happening not just with content filter, but randomly as well)
Currently about 5-10% of messages are incorrectly classified and cut off.
The weird thing is that they donā€™t reject the request, they only stop streaming data at the end. It makes the UX really bad.

Potentially using another TTS like eleven labs is necessary

1 Like

I also encounter this issue. I do not use response.cancel anywhere, it randomly responds with an incomplete audio which severely degrades the experience.

1 Like

Iā€™ve also been encountering this issue a lot. Iā€™m using AzureOpenAI and a modified version of the aoai-realtime-audio-sdk python rtclient.

Some observations:

  • Tends to occur most frequently at the first audio response, sometimes up to 50% of the time
  • The audio transcript returned by the realtime api is always complete
  • Iā€™ve verified that that the audio Iā€™m receiving from the realtime api is truncated by sending it directly to another STT, which results in a truncated transcript- so its not an issue with how Iā€™m sending audio back to my client.
  • Iā€™ve manually inspected all the websocket messages received by the realtime api - there are no errors. A user above said sometimes the content filter triggers this but per the openAI docs, the content moderation filter cutoff would result in an error event - which I definitely do not receive when this issue occurs
  • Even when I remove all sessions updates, conversation deletes, response cancellations, etc - so literally all Iā€™m sending to the api is input audio buffer append messages, I still encounter this issue

Hoping theyā€™ll fix this soon or at least identify a reason/pattern because I have no idea why this is happening

5 Likes

Same issue, I am sure they will fix it soon as it appears to be an issue inherent to the model / api itself.

2 Likes

same here. And guys, this is not random, it seems to depend on the length of the audio response imo. thatā€™s why it looks random. I am also using the AzureOpenAI aoai-realtime-audio-sdk (version 0.5.2 I built , previous one was not managing correctly function call responsesā€¦)

EDIT: IN FACT THIS DIDNT SOLVE THE PROBLEM. AUDIO LAST 0.5-1 SECONS IS MISSING MOST OF THE TIMES NOW. PLEASE FIX THIS OPENAI.

In fact I think I found the problem. I really donā€™t have time to dig in code lol. Itā€™s funny some still think we are heading in 2-3 years to a future of total abundance just like magic. Without a massive human effortā€¦

In the rtclient the RTAudioContent class:

async def audio_chunks(self) ā†’ AsyncGenerator[bytes]:
while True:
message = await self.__content_queue.receive(
lambda m: m.type in [ā€œresponse.audio.deltaā€, ā€œresponse.audio.doneā€]
)
if message is None:
break
if message.type == ā€œresponse.content_part.doneā€: # ā† This might be coming too early
self._part = message.part
break
if message.type == ā€œerrorā€:
raise RealtimeException(message.error)
if message.type == ā€œresponse.audio.deltaā€:
yield base64.b64decode(message.delta)
elif message.type == ā€œresponse.audio.doneā€:
# We are skipping this as itā€™s information is already provided by ā€˜response.content_part.doneā€™
continue

in your code, when you receive the item message you need to wait for both the content_part.done AND audio.done messages. I guess they arrive asynch so if you get the audio.done first maybe drop the last chunks or so.

I would assume for others using jscript the issue is similar.

have a good week.

Iā€™m using java and had to implement the whole client myself, and I also experience it there, so probably not 100% JS issue

Hey yeah I also thought this was a potential issue initially but itā€™s not. Itā€™s only breaking for content_part.done (not response.audio.done).

Iā€™ve tried different things here and the tldr is that there arenā€™t any more audio.delta messages being sent that we would theoretically be missing by breaking too early.

1 Like

I think you might be right. I just reproduced again.

To be clear, I have the same problem using OpenAI instead of Azure OpenAI. I tried with both, similar issue.

i havenā€™t noticed a correlation with audio length personally, sometimes it cuts off for short sentences and sometimes for long ones

yup same, have also tried w/ various python and JS clients and can reproduce the issue

1 Like

not in the sense of longer vs shorter, but in the sense of matching specific time quantity multiples. E.g. last audio chunck on server side happens not to contain enough bytes to fill in enough some buffer and is dropped on OAI side (?). I am not an audio expert at all.

Also, I wonder why is doesnt happen at all with chatgpt app, and how different dev API is from the API they use themselvesā€¦ why would they have two in factā€¦ so confused witht this.

Thanks for flagging this. Iā€™m looking into this now, will get back once I have more info.

2 Likes