[Realtime API] Agent responding to microphone input that did not become part of transcription

silvije · May 2, 2025, 11:16pm

I have noticed curious cases where agent was clearly responding to background dialog (i.e. a podcast interview), but that dialog did not make it to the transcription update returned by server.

It is like AI model has access to unfiltered microphone input and transcription returned is passing additional filter or something. Also it is most definitely reacting to emotion and loudness in voice.

Did somebody else notice something similar? I think a transcription returned to client should be more detailed and include all this details.

Topic		Replies	Views
Crazy Hallucinations by Realtime API unrecorded in transcript Bugs realtime	5	1023	April 14, 2025
Realtime Transcription mode Leaking System Prompt Bugs realtime , api-realtime	5	360	November 12, 2025
Creepy bug of Realtime API + Function Calling: Extra Audio Not in Transcription Bugs function-calling , realtime , api-realtime	20	1574	August 12, 2025
User transcription in Realtime in playground is incorrect, althougth the model understood it API realtime	1	249	October 10, 2024
Realtime response to return empty transcript for agent's audio turn before a tool call API realtime , api-realtime , api-realtime-speech , gpt-realtime	0	110	October 2, 2025

[Realtime API] Agent responding to microphone input that did not become part of transcription

Related topics