We are experiencing critical transcript contamination in the Realtime API. The response.audio_transcript.done event returns content from OTHER sessions, including internal tokenizer tokens that should never be exposed.
Environment
API: OpenAI Realtime API (WebSocket)
Model: gpt-realtime-2025-08-28 (GA)
Integration: Twilio Voice
Audio Format: G.711 μ-law
Timeline
December 20-22, 2025 (UTC)
Symptoms
Internal tokens exposed in transcripts:
<|vq_5450|>assistant<|vq_12900|>
Other sessions’ data appearing:
RPG game JSON: {"name": "MainQuestMission", "player_character": {"name": "Aria"}}
Truncated/corrupted responses:
{“name”: “Dice Gobbler”, "description
(cut off mid-JSON)
Unrelated generic responses:
“Sorry, but I can’t assist with that request.”
Docker deployment guides
Math table calculations
Key Observation
Users did NOT respond to the anomalous content, suggesting they heard correct audio but the transcript was contaminated. This indicates a disconnect between audio generation and transcript reporting.
Impact
3 users affected
14 anomalous messages
Potential multi-tenant data isolation issue
Questions
Is anyone else experiencing similar issues?
Is there a known issue with the transcription pipeline?
How can we report this to the engineering team?
I have detailed logs with Twilio Call SIDs and timestamps available for OpenAI engineering if needed.
We had a similar issue too today. The AI spoke something that we never expected it to say. It felt like that the stream was for some other user and came to us. Should I send the voice recording to you too?
We are an auto dealership tech company. So prompt is on that use case.
User is talking to an agent. The call (Twilio) is going perfectly.
As the Realtime API makes a tool call and waiting for a response from tool call. It starts to say something very weird “I just got promoted at work today. But it’s bittersweet. I wish Dad were here to share this moment with me. He be so proud. I know he is watching me. And I am so thankful to have you by my side through all of this”.
At this point the caller says “Well that was weird”
The AI said “Yeah that is odd, can you please provide me …….” At this point it came back,
We cant see the utterance in #3 in transcript. Only voice. I have audio.
Thanks for sharing this. Really helpful to see we’re not alone. Also, that “I wish Dad were here” line is… quite something. Someone’s having a pretty deep therapy session and it just leaked into your auto dealership call. Privacy nightmare meets awkward comedy.
We’re seeing something similar but opposite - in our case the transcript is contaminated (RPG game data, JavaScript code, internal tokens like <|vq_5450|>) while users kept saying “Hello? Can you hear me?” so audio might not have played at all.
Unfortunately we don’t have call recordings on our end so can’t verify what users actually heard. Your audio evidence is really valuable.
What model version are you on? We’re using gpt-realtime-2025-08-28.
We are using the same model. We never had this problem in the past. It happened today 8:50AM PST.
BTW, I DMed you too. We have one other issue. Our voice on Twilio is Jittery and skips parts of the words/slurry. Do you have similar issue? Can you please share what settings you are using between Twilio and OpenAI?