Sometimes (about each 5th request) audio is cutting off at the end, but in console we receive whole transcription in console.
Language [Kazakh, Russian]
Sometimes (about each 5th request) audio is cutting off at the end, but in console we receive whole transcription in console.
Language [Kazakh, Russian]
I also experience this, in English. Not sure about the frequency (e.g. every 5th request), but it definitely happens every once in a while. Itās always towards the end of sentences. Are you sending any response.cancel
requests? My suspicion on my end was that it is a bug in my application code thatās calling response.cancel
incorrectly/prematurely, before the playback can finish.
Iām not doing any response.cancel events and experiencing this too. Canāt confidently say anything about the frequency but itās VERY frequent
Happens for me too. Voice is cutting off randomly towards end of the last sentence.
Did anyone find a solution, using the Twilio API and Node JS and its being cut off basically every other response. Very frustrating.
This bug is related to the content filter.
(edit: it seems to be happening not just with content filter, but randomly as well)
Currently about 5-10% of messages are incorrectly classified and cut off.
The weird thing is that they donāt reject the request, they only stop streaming data at the end. It makes the UX really bad.
Potentially using another TTS like eleven labs is necessary
I also encounter this issue. I do not use response.cancel anywhere, it randomly responds with an incomplete audio which severely degrades the experience.
Iāve also been encountering this issue a lot. Iām using AzureOpenAI and a modified version of the aoai-realtime-audio-sdk python rtclient.
Some observations:
Hoping theyāll fix this soon or at least identify a reason/pattern because I have no idea why this is happening
Same issue, I am sure they will fix it soon as it appears to be an issue inherent to the model / api itself.
same here. And guys, this is not random, it seems to depend on the length of the audio response imo. thatās why it looks random. I am also using the AzureOpenAI aoai-realtime-audio-sdk (version 0.5.2 I built , previous one was not managing correctly function call responsesā¦)
EDIT: IN FACT THIS DIDNT SOLVE THE PROBLEM. AUDIO LAST 0.5-1 SECONS IS MISSING MOST OF THE TIMES NOW. PLEASE FIX THIS OPENAI.
In fact I think I found the problem. I really donāt have time to dig in code lol. Itās funny some still think we are heading in 2-3 years to a future of total abundance just like magic. Without a massive human effortā¦
In the rtclient the RTAudioContent class:
async def audio_chunks(self) ā AsyncGenerator[bytes]:
while True:
message = await self.__content_queue.receive(
lambda m: m.type in [āresponse.audio.deltaā, āresponse.audio.doneā]
)
if message is None:
break
if message.type == āresponse.content_part.doneā: # ā This might be coming too early
self._part = message.part
break
if message.type == āerrorā:
raise RealtimeException(message.error)
if message.type == āresponse.audio.deltaā:
yield base64.b64decode(message.delta)
elif message.type == āresponse.audio.doneā:
# We are skipping this as itās information is already provided by āresponse.content_part.doneā
continue
in your code, when you receive the item message you need to wait for both the content_part.done AND audio.done messages. I guess they arrive asynch so if you get the audio.done first maybe drop the last chunks or so.
I would assume for others using jscript the issue is similar.
have a good week.
Iām using java and had to implement the whole client myself, and I also experience it there, so probably not 100% JS issue
Hey yeah I also thought this was a potential issue initially but itās not. Itās only breaking for content_part.done (not response.audio.done).
Iāve tried different things here and the tldr is that there arenāt any more audio.delta messages being sent that we would theoretically be missing by breaking too early.
I think you might be right. I just reproduced again.
To be clear, I have the same problem using OpenAI instead of Azure OpenAI. I tried with both, similar issue.
i havenāt noticed a correlation with audio length personally, sometimes it cuts off for short sentences and sometimes for long ones
yup same, have also tried w/ various python and JS clients and can reproduce the issue
not in the sense of longer vs shorter, but in the sense of matching specific time quantity multiples. E.g. last audio chunck on server side happens not to contain enough bytes to fill in enough some buffer and is dropped on OAI side (?). I am not an audio expert at all.
Also, I wonder why is doesnt happen at all with chatgpt app, and how different dev API is from the API they use themselvesā¦ why would they have two in factā¦ so confused witht this.
Thanks for flagging this. Iām looking into this now, will get back once I have more info.