Using realtimeAPI we notice that sometimes, language automatically switches.
For example: Hello Can I Speak to Alina?
GPT Realtime Answers me in Italian.
I also notice it changes depending on the name. If I call Anastasia, it changed to Russian. Amir, arabic. Its impossible that it understands my English as all these different languages so there must be something that is putting weight on this – enough to change the whole conversation language.
I tried (a lot) in the prompt to tell it
Language
Start in {{first_language}} unless Call Screening clearly states the participant’s language; if it does, start in that language.
Continuously track the participant’s language on each new input; if it clearly changes, switch to the new language.
Ignore false signals from background noise, names/brands, isolated words/emojis, numbers, or the agent’s name.
But still it ignores me and I see it changes the language.
Hello Dear @juberti Thanks for your reply.
Yes, if I say “answer in English” it obeys.
Now we made an option in the UI - Allow Language switching. Like this users can configure agents to either stick to the primary language or switch.
But for example I say “I want to speak to Isabella Rossi”. Since “Rossi” is an Italian name, it switches to italian based on the surname. This is the problem I’m having.
In your opinion, how would a simple prompt be? I also feel that this prompt is complicated but everytime we try to simplify this prompt, things get really bad. (by bad I mean It starts to change on background noise, changes the language when there is silence, and other effects)
Dear @tleyden Thanks for your reply.
We do send client events for generating responses and for tool responses, but we don’t use session.update. We saw that it does not always work as expected and tends to create problems. We kept the most simple approach available that gives some kind of consistent results. We noticed that it is very sensitive. We also implemented a tool for detecting the language and it is called correctly at every turn, (So each turn we make a language detection that tells openAI "create a response using this language” (as juberti suggested), but in some cases the detected language is simply wrong.
So now, in order not to delay the release, we have a checkbox - Allow Language switching.
When disabled, it sticks to the intro language
When enabled, it switches the language. Let’s see how this goes… Any feedback would be greatly appreciated.
Hi nbo2,
Switching language process it’s quite tricky with open ai realtime because he always try to guess the language that the user is speaking - that’s an LLM.
So what is happening here is that when a user pronounces a specific name, the LLM will try to reply to you in that same language. This is a big topic and not so easy to handle. The safest thing is to limit the session to one single language - here again, what you need to do is to define a specific instruction mentioning exactly that and set the whisper lang to “en“ (english in this case). If you want to dive deeper into this, it’s good to explore tools (functions) - where you can set something like set_language that open ai should trigger (tell in the instructions) every time it understands the user speaking a language (en, de, pt,..) and update the whisper.
we’ve built a tool that OpenAI calls on every utterance to determine the speaker’s language. The tool is defined with a textual description and a JSON schema, and the model is expected to call it with the detected language at each turn.
However, the language detection is sometimes incorrect.
For example, when I say: “Can I send a voicemail message to Amir?”
…the model unexpectedly switches to Hebrew.
So it ignored the part of the prompt where I tell it "do not switch based on names or single words”..
Very challenging I must say..
right now we added an option “force starting language” as @Joao_Mendonca suggested.