Sometimes (about each 5th request) audio is cutting off at the end, but in console we receive whole transcription in console.
Language [Kazakh, Russian]
Sometimes (about each 5th request) audio is cutting off at the end, but in console we receive whole transcription in console.
Language [Kazakh, Russian]
I also experience this, in English. Not sure about the frequency (e.g. every 5th request), but it definitely happens every once in a while. It’s always towards the end of sentences. Are you sending any response.cancel
requests? My suspicion on my end was that it is a bug in my application code that’s calling response.cancel
incorrectly/prematurely, before the playback can finish.
I’m not doing any response.cancel events and experiencing this too. Can’t confidently say anything about the frequency but it’s VERY frequent
Happens for me too. Voice is cutting off randomly towards end of the last sentence.
Did anyone find a solution, using the Twilio API and Node JS and its being cut off basically every other response. Very frustrating.
This bug is related to the content filter.
(edit: it seems to be happening not just with content filter, but randomly as well)
Currently about 5-10% of messages are incorrectly classified and cut off.
The weird thing is that they don’t reject the request, they only stop streaming data at the end. It makes the UX really bad.
Potentially using another TTS like eleven labs is necessary
I also encounter this issue. I do not use response.cancel anywhere, it randomly responds with an incomplete audio which severely degrades the experience.
I’ve also been encountering this issue a lot. I’m using AzureOpenAI and a modified version of the aoai-realtime-audio-sdk python rtclient.
Some observations:
Hoping they’ll fix this soon or at least identify a reason/pattern because I have no idea why this is happening
Same issue, I am sure they will fix it soon as it appears to be an issue inherent to the model / api itself.
same here. And guys, this is not random, it seems to depend on the length of the audio response imo. that’s why it looks random. I am also using the AzureOpenAI aoai-realtime-audio-sdk (version 0.5.2 I built , previous one was not managing correctly function call responses…)
EDIT: IN FACT THIS DIDNT SOLVE THE PROBLEM. AUDIO LAST 0.5-1 SECONS IS MISSING MOST OF THE TIMES NOW. PLEASE FIX THIS OPENAI.
In fact I think I found the problem. I really don’t have time to dig in code lol. It’s funny some still think we are heading in 2-3 years to a future of total abundance just like magic. Without a massive human effort…
In the rtclient the RTAudioContent class:
async def audio_chunks(self) -> AsyncGenerator[bytes]:
while True:
message = await self.__content_queue.receive(
lambda m: m.type in ["response.audio.delta", "response.audio.done"]
)
if message is None:
break
if message.type == "response.content_part.done": # <-- This might be coming too early
self._part = message.part
break
if message.type == "error":
raise RealtimeException(message.error)
if message.type == "response.audio.delta":
yield base64.b64decode(message.delta)
elif message.type == "response.audio.done":
# We are skipping this as it's information is already provided by 'response.content_part.done'
continue
in your code, when you receive the item message you need to wait for both the content_part.done AND audio.done messages. I guess they arrive asynch so if you get the audio.done first maybe drop the last chunks or so.
I would assume for others using jscript the issue is similar.
have a good week.
I’m using java and had to implement the whole client myself, and I also experience it there, so probably not 100% JS issue
Hey yeah I also thought this was a potential issue initially but it’s not. It’s only breaking for content_part.done (not response.audio.done).
I’ve tried different things here and the tldr is that there aren’t any more audio.delta messages being sent that we would theoretically be missing by breaking too early.
I think you might be right. I just reproduced again.
To be clear, I have the same problem using OpenAI instead of Azure OpenAI. I tried with both, similar issue.
i haven’t noticed a correlation with audio length personally, sometimes it cuts off for short sentences and sometimes for long ones
yup same, have also tried w/ various python and JS clients and can reproduce the issue
not in the sense of longer vs shorter, but in the sense of matching specific time quantity multiples. E.g. last audio chunck on server side happens not to contain enough bytes to fill in enough some buffer and is dropped on OAI side (?). I am not an audio expert at all.
Also, I wonder why is doesnt happen at all with chatgpt app, and how different dev API is from the API they use themselves… why would they have two in fact… so confused witht this.
Thanks for flagging this. I’m looking into this now, will get back once I have more info.
Following - this has been happening extremely consistently. Really looking for an update/fix here. Happy to provide more info if needed.