Creepy bug of Realtime API + Function Calling: Extra Audio Not in Transcription

tsar · February 12, 2025, 5:27pm

This is actually two bugs combined:

Arguments for Function Calling appear inside the text transcription.
When this happens, the generated audio contains unrelated content instead of voicing these arguments. Often, this content has no connection to the conversation topic and can even be in a different language.

In my example, the audio contains more than twice as much speech as the transcription, becomes chaotic toward the end, and includes repetitions.

Video (with generated audio) how this session looked like, untranscripted audio starts at 0:27: openai_realtime_session_4c740659-88bb-4c56-9765-a80240491b76_coral.mp4 - Google Drive

Full websocket session data exchange: openai_realtime_session_4c740659-88bb-4c56-9765-a80240491b76_coral.txt - Google Drive

Screenshot:

Another example.
In this case, the audio unexpectedly switches from Russian to Japanese at 0:21, right after voicing the transcribed text: openai_realtime_session_ee51d0bc-a878-41a5-bd28-5d408f60dc1d_coral.mp4 - Google Drive

Full websocket session data exchange: openai_realtime_session_ee51d0bc-a878-41a5-bd28-5d408f60dc1d_coral.txt - Google Drive

Let me know if you need any additional logs or details to help debug this issue.

m.xingster · February 13, 2025, 6:57pm

I’ve also seen this! However, not to the scale of 30 seconds.

hugebelts · February 13, 2025, 7:03pm

Maybe that’s why.
we’ve always seen weird glitches when they’re updating something.

tsar · February 13, 2025, 7:21pm

@hugebelts, not sure, I’ve seen the bug reproducing for a few days already. Also I see very similar issues in “Related” section:

Looks like the bug is present since October.

Foxalabs · February 13, 2025, 7:30pm

Which of the OpenAI realtime API voices is this? Coral?

I’ve never had any extraneous spoken dialog in my interactions, interested to find out what might be the cause.

Sending blank audio at odd times can cause issues, also very low quality audio as input can confuse the model. Do you happen to keep a copy of the input audio to check against?

tsar · February 13, 2025, 7:57pm

@Foxalabs, I don’t think it depends on voice. I reproduced the bug with coral, shimmer and sage. Haven’t tested with other voices.

Foxalabs · February 13, 2025, 8:05pm

Gotcha, I’ve never managed to get that effect so I wonder what you’re going different, do you have your end to end speech code to look at?

Personally I just handle socket comms to the OPenAI endpoint and do it manually, not sure if this is with the WebRTC or not?

tsar · February 13, 2025, 8:26pm

@Foxalabs, I’m also working with Realtime API manually via websocket. See full log of websocket sessions which I posted in the original message above.

I’m asking Realtime model to do both: respond to user and also call function set_emotion. Sometimes it works as expected, but sometimes this crazy bug appears. I suppose that incorrectly appearing function call in the response text causes audio to go crazy.

m.xingster · February 21, 2025, 5:49pm

I’m still seeing this in some cases, I don’t think it’s a function call since I don’t use function calling. The AI sometimes also starts spewing text from previous responses that isn’t included in the transcription then transitions to what it’s saying now.

Foxalabs · February 21, 2025, 10:39pm

I’ll make sure I raise it at our next meeting with OAI, the realtime API is still in beta and does have a number of issues, most of which are already logged and being worked on. Big fan of the low latency speech API’s myself, the potential is huge.

tsar · March 9, 2025, 9:35pm

Hi @Foxalabs!
Did you have a chance to tell OpenAI about the bug on a meeting?

malth · March 20, 2025, 1:39pm

Experiencing the same thing. Really odd and a bit creepy

tsar · March 20, 2025, 3:41pm

I realised that sometimes it happens without function calling too.

hagen.rode · March 23, 2025, 4:27pm

Yeah, this happens to me too. Not often, but often enough. The transcription looks fine, but the audio contains very strange (and yes, creepy) sentences.

sarvesh10n · April 4, 2025, 3:27pm

Was anyone able to get any workaround for this issue?

coccoinomane · April 22, 2025, 2:56pm

I am experiencing a similar bug, with WebRTC. The transcription in the response.done server event is perfect, but the audio gets crazy after a few seconds. I do use function calling, but I am not sure whether the bug has anything to do with it.

m.xingster · April 24, 2025, 4:12am

Sometimes I will even see that the AI mimics the users voice. Very creepy. Hopefully this can be fixed in their next release.

dchl · August 4, 2025, 2:00am

Adding to this, it randomly put this sentence in the middle of what it was supposed to say: “… You’re my biggest fan, just want to tell you thank you, thank you, thank you for your kindness and support. I couldn’t be more grateful for the opportunity to share you and your beautiful personality with the world …”

tsar · August 9, 2025, 12:59pm

@dchl, did you get this already with the new gpt-4o-realtime-preview-2025-06-03 (which became default on June 24, 2025)?

dchl · August 12, 2025, 9:24am

I got this on the old model, just recently switched to gpt-4o-realtime-preview-2025-06-03 and no such nonsense so far! But I did have to redo my prompts for this new model.

Topic		Replies	Views
Crazy Hallucinations by Realtime API unrecorded in transcript Bugs realtime	5	894	April 14, 2025
Realtime api transcribe gibberish API api-realtime , api-realtime-speech	19	561	December 10, 2025
RealtimeAPI Producing Infinite Audio Deltas Near Function Call Bugs bug , realtime , api-realtime , api-realtime-speech	1	223	April 17, 2025
Realtime Transcription mode Leaking System Prompt Bugs realtime , api-realtime	5	242	November 12, 2025
RealTime API Transcription errors Bugs realtime	7	2229	January 9, 2025

Creepy bug of Realtime API + Function Calling: Extra Audio Not in Transcription

Related topics