Issue with Mismatch Between Realtime Audio and Transcription Text

xuchang0514 · November 14, 2024, 10:17am

Hi everyone,

I’m encountering an issue when using Realtime for word teaching. For example, I’m teaching words like “apple,” “banana,” and “orange” in sequence. However, sometimes the transcription text doesn’t match the actual audio content.

For instance, when the audio plays “please repeat after me, apple,” the transcription text shows “please repeat after me, banana.” I’m wondering if anyone else has faced this issue? How did you resolve it?

Looking forward to any insights or solutions. Thanks in advance!

s.azuma · February 4, 2025, 6:25am

I am facing the same problem.

I am using Realtime API in Japanese, and the transcription text outputs “Monday to Friday” but actually pronounces it as “Thursday to Friday”, the transcription text outputs “11,000 yen” but actually pronounces it as “111,000 yen”, etc. In both cases, the transcription is the preferred answer and the pronunciation is incorrect.

This mistake is reproducible, and no matter how many times you ask, it will still make the same mistake.

Also, if you know how AI makes mistakes, you can correct them to some extent by instruction. For example, pronunciation could be corrected by prompts like "For the amount of money, please convert and output it as ‘11 thousand yen’ instead of ‘11,000 yen’.

kevinseanscalabrini · February 4, 2025, 1:01pm

they are two different models acting at the same time, so simply put, the realtime one is either not “hearing” the audio correctly or the whisper one is not depending on which is correct. They dont know about each other (i dont think whisper gets into the context for realtime at all at least, perhaps you could feed conversation.item.create events if you want to try).

Until the model or system are both better at hearing the audio correctly, it will continue to make these mistakes

Topic		Replies	Views
RealTime API Transcription errors Bugs realtime	7	1879	January 9, 2025
How to prevent TTS mispronunciations in real-time speech responses? Bugs api-realtime-speech	0	44	July 2, 2025
Realtime API Gets Names Horribly Wrong API realtime	13	1041	June 10, 2025
[Realtime API] Audio Output Numbers Wrong Bugs realtime	3	350	March 17, 2025
Query Misinterpretation in Realtime Preview API advanced-voice , realtime	6	164	February 3, 2025

Issue with Mismatch Between Realtime Audio and Transcription Text

Related topics