Am I crazy, or is the Realtime API completely different from the Advanced Voice Mode? It really feels like it’s been nerfed in terms of understanding and capabilities. It struggles with understanding French, has trouble with different accents, mixes up languages, and loses track of the conversation more easily.
I use ChatGPT’s Advanced Mode every day in French, and I don’t encounter any problems. But with the Realtime API, after just three messages, it gets completely lost, starts speaking in other languages, or responds to old questions.
A simple example: ask it to speak with a Marseillais accent from France. The Advanced Mode handles it perfectly, but the Realtime API doesn’t even try.
I’ve not had this problem, in fact I’ve have had the Realtime API speak almost fluent Icelandic, a language spoken only by around 400,000 people, which was super impressive and almost unbelievable. But also many other languages, French, Spanish, Chinese without any problems. Just say, please repeat this in so and so and it seems to work each time.
However, not sure if this is your problem - when you have a situation on desktop with speakers and a mic that does not cancel the speakers audio, I’ve had strange things happen where the model is listening to, interrupting itself and then sometimes speaking in different languages.
This could be due to the user audio not being given right.
Are you:
-
Sending a text request to trigger an audio response?
→ This could eliminate the possibility of the input audio being the issue (for testing purposes) -
If sending text works better for you, you might want to try the following:
→ Before sending any user Audio to the API, save it to a file and listen to it, see if you have the sample rate and all other audio requirements right.
I am using the API in German and I have to say that it is definitely “nerfed” in some way. Probably a more censored model compared to the Advanced void mode (AVM).
This is because AVM is ran by OpenAI, so they can handle monitoring and censoring.
In the model they provide in the API, mostly YOU have to take care of the server-side monitoring of prompts, output etc.
Of course they expect that most user won’t handle monitoring so they provide a more censored model (which makes AI in general a bit less reliable).
Good luck with your issue, feel free to provide any more info you have.