Challenges in Multilingual Understanding with Realtime APIs

tomrun · October 24, 2024, 9:28pm

Am I crazy, or is the Realtime API completely different from the Advanced Voice Mode? It really feels like it’s been nerfed in terms of understanding and capabilities. It struggles with understanding French, has trouble with different accents, mixes up languages, and loses track of the conversation more easily.
I use ChatGPT’s Advanced Mode every day in French, and I don’t encounter any problems. But with the Realtime API, after just three messages, it gets completely lost, starts speaking in other languages, or responds to old questions.

tomrun · October 24, 2024, 9:41pm

A simple example: ask it to speak with a Marseillais accent from France. The Advanced Mode handles it perfectly, but the Realtime API doesn’t even try.

robertb · October 24, 2024, 11:14pm

I’ve not had this problem, in fact I’ve have had the Realtime API speak almost fluent Icelandic, a language spoken only by around 400,000 people, which was super impressive and almost unbelievable. But also many other languages, French, Spanish, Chinese without any problems. Just say, please repeat this in so and so and it seems to work each time.

However, not sure if this is your problem - when you have a situation on desktop with speakers and a mic that does not cancel the speakers audio, I’ve had strange things happen where the model is listening to, interrupting itself and then sometimes speaking in different languages.

j.wischnat · October 25, 2024, 5:57am

This could be due to the user audio not being given right.

Are you:

Sending a text request to trigger an audio response?
→ This could eliminate the possibility of the input audio being the issue (for testing purposes)
If sending text works better for you, you might want to try the following:
→ Before sending any user Audio to the API, save it to a file and listen to it, see if you have the sample rate and all other audio requirements right.

I am using the API in German and I have to say that it is definitely “nerfed” in some way. Probably a more censored model compared to the Advanced void mode (AVM).

This is because AVM is ran by OpenAI, so they can handle monitoring and censoring.

In the model they provide in the API, mostly YOU have to take care of the server-side monitoring of prompts, output etc.

Of course they expect that most user won’t handle monitoring so they provide a more censored model (which makes AI in general a bit less reliable).

Good luck with your issue, feel free to provide any more info you have.

Topic		Replies	Views
Realtime API nerfed vs Advanced Voice Mode? Feedback realtime	10	1965	February 11, 2025
Realtime API issues - good practices API realtime , api-realtime , api-realtime-speech	3	910	January 3, 2025
Realtime voice API is much much worse than advanced voice mode in the app Feedback	2	288	April 8, 2025
RealTime API Transcription errors Bugs realtime	7	1613	January 9, 2025
Innacurate behavior from gpt-4-0125-preview model Feedback gpt-4 , api	4	463	April 18, 2024

Challenges in Multilingual Understanding with Realtime APIs

Related topics