Whisper is translating my audios for some reason

I just got started using the new Whisper API (the one with the endpoint at https://api.openai.com/v1/audio/transcriptions). It works incredibly well when it gets the language right, but for some reason, it will sometimes give me an Arabic or Indi transcription.

I’m a native Spanish speaker so my English pronunciation may have caused the AI to think that I’m speaking another language. But when I translated the Arabic transcriptions into English, the translation was exactly what I said! So the AI actually did understand what I said in English, and then translated it into Arabic. I have zero idea why this is happening. I don’t know if there is a way to specify the languages I want to use, let alone how to tell Whisper not to not translate anything I say into another language.

Any idea why it’s doing this and how can I prevent it from doing it? I also tried filling up the prompt with English text but that doesn’t seem to make much of a difference.

You could try also setting the prompt variable to contain a few sentences in English to get it to stay on track, from the Docs:

Oops, just read this. Hmm. Stumped on this one.

Are you specifying the language parameter? If you know the input language, this will help with more consistent results.

Yes, I didn’t know about that before but now I’m using just ‘en’. I would like to use multiple languages though, so I wonder if I can use a comma separated list.

Also, I thought a bit more about this and now think that maybe what happened is not that it was translating but just switching character sets, since it’s weird that it never translated into a language that uses the Roman script. I’m assuming there is some standard way to map between characters sets? This also sounds like a simpler explanation. But the issue hasn’t occurred since I added the language parameter.