Realtime API nerfed vs Advanced Voice Mode?

ssdavid · October 5, 2024, 11:34am

I got access to the Realtime API this morning.
Playing around with it in the playground it.

Seems like the model is pretty nerfed vs Advanced Voice Mode. Has limited voice models that has a lot less range than AVM in pronounciation, tonality and accents.

What’s your experience?

alden · October 5, 2024, 11:53am

You might be right with the accents. I seem to have no problem in AVM (Indian accent) – but the realtime API seems to be having issues with the accent (atleast for me) – this is in preliminary testing.

Beyond this: This Realtime API is amazing … (just created a POC for Realtime API + RAG) – this is going to change a lot of things (like data centers)

stevenic · October 5, 2024, 12:22pm

Are you doing RAG as part of the instructions or via something else like tools?

arnaudsm · October 5, 2024, 12:33pm

100% agree @ssdavid, this is a basic STT - LLM - TTS like the old voice chatGPT. It cannot do any accents nor non-verbal communication.

I’m in Europe, are you? Any american user can check if their API can fake accents and non-verbal cues ?

ssdavid · October 5, 2024, 12:54pm

Yep, based in Europe. Asked it to switch to Hungarian. AVM is surprisingly good at it, standard voice mode is terrible. This was A LOT closer to SVM.

Haven’t yet checked RAG, but did play around with function calling. Pricing is crazy so will need to do a lot of cost optimization in architecture to make it worthwhile.

arnaudsm · October 5, 2024, 1:31pm

The marketing was quite dishonest in cultivating the ambiguity with AVM, probably to justify the pricing.

Gemini 1.5 Pro 002 + whisper for output TTS has the same performance for 32x cheaper. The only difference being the input latency

alden · October 6, 2024, 2:09pm

Using tools (function calling) to get the context from the RAG (Using our own RAG platform) – that part is working great … It’s the UX challenges that seem to be a pain.

alden · October 6, 2024, 2:11pm

I asked it to respond in Hindi and it did a pretty good job with the language and the accent. (The only odd thing was an American voice talking in Hindi – but beyond that, I was quite surprised that it got the pronunciations pretty good)

RAG works great with function calling – got it working well … the only problem is controlling hallucinations (so you will need to test and control extensively for that).

keaton2y9 · January 15, 2025, 2:36pm

I don’t understand how this isn’t a bigger complaint. This is the only complaint regarding this I could find across all of the forums/reddit/google/discord.

Surely there are more people that see the value in the astronomical difference between the existing OpenAI API tech and the tech behind the ChatGPT advanced voice. Like it’s genuinely MILES ahead of it?

I’m curious when we will see it added as part of the API, but I wouldn’t be surprised to see it at a huge markup. This level of voice AI with a already existing RAG tech will revolutionise so many industries, but it’s just not quite there with the current API.

bruno.vaz · January 24, 2025, 4:03pm

I agree with you.
I was just now testing the realtime API and comparing it with the advanced voice mode from chatgpt and the experience was so much better with advanced voice mode. I thought I might be doing something wrong, but from this post I can see I’m not the only one with this problem.
Thank you!

jpoel · February 11, 2025, 12:24am

Do we have any updates on this? I agree that the advanced voice mode from chatgpt is absolutely miles ahead of the realtime api. Like, it’s not even close. What’s going on?

Topic		Replies	Views
Realtime voice API is much much worse than advanced voice mode in the app Feedback	3	586	July 8, 2025
Advanced Voice Mode for API API	22	19408	October 5, 2024
Challenges in Multilingual Understanding with Realtime APIs Feedback bug , realtime , api-realtime	3	1077	October 25, 2024
Updates to building agents: Typescript Agents SDK, a new RealtimeAgent feature for voice agents, Traces for Realtime, and speech-to-speech improvements Announcements realtime , api-realtime	54	4904	June 24, 2025
Real Time API Voice vs Chat GPT Real Time Voice! Feedback	0	775	December 20, 2024

Realtime API nerfed vs Advanced Voice Mode?

Related topics