[Realtime API] Audio is randomly cutting off at the end

A3Studios · October 15, 2024, 9:48pm

Sometimes (about each 5th request) audio is cutting off at the end, but in console we receive whole transcription in console.

Language [Kazakh, Russian]

stevenou · October 16, 2024, 3:11pm

I also experience this, in English. Not sure about the frequency (e.g. every 5th request), but it definitely happens every once in a while. It’s always towards the end of sentences. Are you sending any response.cancel requests? My suspicion on my end was that it is a bug in my application code that’s calling response.cancel incorrectly/prematurely, before the playback can finish.

ivan-luchkin-u · October 16, 2024, 3:57pm

I’m not doing any response.cancel events and experiencing this too. Can’t confidently say anything about the frequency but it’s VERY frequent

hagen.rode · October 21, 2024, 1:53pm

Happens for me too. Voice is cutting off randomly towards end of the last sentence.

DivineStatus · October 24, 2024, 7:37pm

Did anyone find a solution, using the Twilio API and Node JS and its being cut off basically every other response. Very frustrating.

sorgente711 · October 25, 2024, 8:08am

This bug is related to the content filter.
(edit: it seems to be happening not just with content filter, but randomly as well)
Currently about 5-10% of messages are incorrectly classified and cut off.
The weird thing is that they don’t reject the request, they only stop streaming data at the end. It makes the UX really bad.

Potentially using another TTS like eleven labs is necessary

j0rdan · October 26, 2024, 7:49pm

I also encounter this issue. I do not use response.cancel anywhere, it randomly responds with an incomplete audio which severely degrades the experience.

ashwinnayak · October 27, 2024, 4:16am

I’ve also been encountering this issue a lot. I’m using AzureOpenAI and a modified version of the aoai-realtime-audio-sdk python rtclient.

Some observations:

Tends to occur most frequently at the first audio response, sometimes up to 50% of the time
The audio transcript returned by the realtime api is always complete
I’ve verified that that the audio I’m receiving from the realtime api is truncated by sending it directly to another STT, which results in a truncated transcript- so its not an issue with how I’m sending audio back to my client.
I’ve manually inspected all the websocket messages received by the realtime api - there are no errors. A user above said sometimes the content filter triggers this but per the openAI docs, the content moderation filter cutoff would result in an error event - which I definitely do not receive when this issue occurs
Even when I remove all sessions updates, conversation deletes, response cancellations, etc - so literally all I’m sending to the api is input audio buffer append messages, I still encounter this issue

Hoping they’ll fix this soon or at least identify a reason/pattern because I have no idea why this is happening

Eigenspan · October 29, 2024, 12:47pm

Same issue, I am sure they will fix it soon as it appears to be an issue inherent to the model / api itself.

robertgr · November 4, 2024, 9:47am

same here. And guys, this is not random, it seems to depend on the length of the audio response imo. that’s why it looks random. I am also using the AzureOpenAI aoai-realtime-audio-sdk (version 0.5.2 I built , previous one was not managing correctly function call responses…)

robertgr · November 4, 2024, 10:10am

EDIT: IN FACT THIS DIDNT SOLVE THE PROBLEM. AUDIO LAST 0.5-1 SECONS IS MISSING MOST OF THE TIMES NOW. PLEASE FIX THIS OPENAI.

In fact I think I found the problem. I really don’t have time to dig in code lol. It’s funny some still think we are heading in 2-3 years to a future of total abundance just like magic. Without a massive human effort…

In the rtclient the RTAudioContent class:

async def audio_chunks(self) -> AsyncGenerator[bytes]:
    while True:
        message = await self.__content_queue.receive(
            lambda m: m.type in ["response.audio.delta", "response.audio.done"]
        )
        if message is None:
            break
        if message.type == "response.content_part.done":  # <-- This might be coming too early
            self._part = message.part
            break
        if message.type == "error":
            raise RealtimeException(message.error)
        if message.type == "response.audio.delta":
            yield base64.b64decode(message.delta)
        elif message.type == "response.audio.done":
            # We are skipping this as it's information is already provided by 'response.content_part.done'
            continue

in your code, when you receive the item message you need to wait for both the content_part.done AND audio.done messages. I guess they arrive asynch so if you get the audio.done first maybe drop the last chunks or so.

I would assume for others using jscript the issue is similar.

have a good week.

ivan-luchkin-u · November 4, 2024, 7:01pm

I’m using java and had to implement the whole client myself, and I also experience it there, so probably not 100% JS issue

ashwinnayak · November 4, 2024, 8:12pm

Hey yeah I also thought this was a potential issue initially but it’s not. It’s only breaking for content_part.done (not response.audio.done).

I’ve tried different things here and the tldr is that there aren’t any more audio.delta messages being sent that we would theoretically be missing by breaking too early.

robertgr · November 4, 2024, 10:34pm

I think you might be right. I just reproduced again.

robertgr · November 4, 2024, 10:52pm

To be clear, I have the same problem using OpenAI instead of Azure OpenAI. I tried with both, similar issue.

mcpower2 · November 4, 2024, 11:00pm

i haven’t noticed a correlation with audio length personally, sometimes it cuts off for short sentences and sometimes for long ones

ashwinnayak · November 4, 2024, 11:26pm

yup same, have also tried w/ various python and JS clients and can reproduce the issue

robertgr · November 5, 2024, 7:29am

not in the sense of longer vs shorter, but in the sense of matching specific time quantity multiples. E.g. last audio chunck on server side happens not to contain enough bytes to fill in enough some buffer and is dropped on OAI side (?). I am not an audio expert at all.

Also, I wonder why is doesnt happen at all with chatgpt app, and how different dev API is from the API they use themselves… why would they have two in fact… so confused witht this.

gokulraya · November 5, 2024, 10:50pm

Thanks for flagging this. I’m looking into this now, will get back once I have more info.

ltl · November 8, 2024, 5:45pm

Following - this has been happening extremely consistently. Really looking for an update/fix here. Happy to provide more info if needed.

Topic		Replies	Views
Realtime API extremely expensive Feedback realtime	66	7191	December 4, 2024
Realtime transcription messages flow is wrong Bugs transcribe , realtime	12	800	July 13, 2025
[Realtime API] AI Answering Gibberish API realtime , api-realtime , api-realtime-speech	9	894	October 25, 2024
Streaming from Text-to-Speech api API api , python , tts	53	52866	January 21, 2025
4o and 4 API output has typo/missing words Bugs gpt-4	55	828	July 19, 2024

[Realtime API] Audio is randomly cutting off at the end

Related topics