[Realtime API] Audio Output Numbers Wrong

Realtime api outputting audio with the wrong numbers. Lets say for example you ask it to check on your order 5591.

It will often respond with something like; “sure let me check on order five hundred and ninty one for you.”

Prompting it to only read out the digits of a number and giving it examples lowered it so that it only happens about 1/5th - 1/10th of the time but that’s still too much.

Also the response transcript is still always right. And to it’s credit it always takes the input properly. When I say 5591 it always inputs as 5591 it just doesn’t read it out properly.

If anyone has any tips to fix it would be greatly appreciated. Overall the model is phenomenal and has very few bugs.

1 Like

Have you tried feeding it as 5 5 9 1 (with spaces) or converting it into words like five five nine one?

anybody found a solution for this?

I have encountered the same issue with Italian when using the real-time APIs. Numbers are not recognized correctly (and it’s even worse in Italian than in English). While word recognition is accurate, numbers are always misinterpreted. I also noticed that the advanced voice agent in the ChatGPT app has exactly the same problem. I believe the issue lies in the speech-to-text engine used by the model. I hope they improve it soon. I have tested other speech-to-text engines, and they do not have this issue. Even the Gemini voice agent accurately recognizes numbers in Italian.

1 Like