I just got started using the new Whisper API (the one with the endpoint at
https://api.openai.com/v1/audio/transcriptions). It works incredibly well when it gets the language right, but for some reason, it will sometimes give me an Arabic or Indi transcription.
I’m a native Spanish speaker so my English pronunciation may have caused the AI to think that I’m speaking another language. But when I translated the Arabic transcriptions into English, the translation was exactly what I said! So the AI actually did understand what I said in English, and then translated it into Arabic. I have zero idea why this is happening. I don’t know if there is a way to specify the languages I want to use, let alone how to tell Whisper not to not translate anything I say into another language.
Any idea why it’s doing this and how can I prevent it from doing it? I also tried filling up the prompt with English text but that doesn’t seem to make much of a difference.
You could try also setting the
prompt variable to contain a few sentences in English to get it to stay on track, from the Docs:
Oops, just read this. Hmm. Stumped on this one.
Are you specifying the language parameter? If you know the input language, this will help with more consistent results.
Yes, I didn’t know about that before but now I’m using just ‘en’. I would like to use multiple languages though, so I wonder if I can use a comma separated list.
Also, I thought a bit more about this and now think that maybe what happened is not that it was translating but just switching character sets, since it’s weird that it never translated into a language that uses the Roman script. I’m assuming there is some standard way to map between characters sets? This also sounds like a simpler explanation. But the issue hasn’t occurred since I added the language parameter.
This is still happening. I would say something in English and then it will show me what I said, but in Russian.
Yup, it’s still happening to me too. Sometimes it seems to help to speak very slowly. I’m convinced it’s related to the character sets since it always happens to translate into languages that use different characters. I think the AI has a kind of separate “brain area” that decides which charset to use based on your accent, and if it sounds a little Arabic it’ll switch to that charset and then be forced to translate into any language that uses this charset so that the output sentence makes sense.
Having a prompt with two languages with the same amount of each language seems to help a bit for me. Although, not perfect. In my case, the user might say either Japanese only, English only, or both.
For example, my prompt was “私の名前は山本です。My name is Yamamoto.”.
Then, I tried saying “よく覚えてないんですけど、 I think I did it for around 5 years.” and whisper returned an expected response in both Japanese and English. (よく覚えてないんですけど、 I think I did it for around 5 years.)
“私の名前は山本です。My name is Yamamoto”