Whisper is translating my audios for some reason

I just got started using the new Whisper API (the one with the endpoint at https://api.openai.com/v1/audio/transcriptions). It works incredibly well when it gets the language right, but for some reason, it will sometimes give me an Arabic or Indi transcription.

I’m a native Spanish speaker so my English pronunciation may have caused the AI to think that I’m speaking another language. But when I translated the Arabic transcriptions into English, the translation was exactly what I said! So the AI actually did understand what I said in English, and then translated it into Arabic. I have zero idea why this is happening. I don’t know if there is a way to specify the languages I want to use, let alone how to tell Whisper not to not translate anything I say into another language.

Any idea why it’s doing this and how can I prevent it from doing it? I also tried filling up the prompt with English text but that doesn’t seem to make much of a difference.

You could try also setting the prompt variable to contain a few sentences in English to get it to stay on track, from the Docs:

1 Like

Oops, just read this. Hmm. Stumped on this one.

Are you specifying the language parameter? If you know the input language, this will help with more consistent results.

1 Like

Yes, I didn’t know about that before but now I’m using just ‘en’. I would like to use multiple languages though, so I wonder if I can use a comma separated list.

Also, I thought a bit more about this and now think that maybe what happened is not that it was translating but just switching character sets, since it’s weird that it never translated into a language that uses the Roman script. I’m assuming there is some standard way to map between characters sets? This also sounds like a simpler explanation. But the issue hasn’t occurred since I added the language parameter.

This is still happening. I would say something in English and then it will show me what I said, but in Russian.

1 Like

Yup, it’s still happening to me too. Sometimes it seems to help to speak very slowly. I’m convinced it’s related to the character sets since it always happens to translate into languages that use different characters. I think the AI has a kind of separate “brain area” that decides which charset to use based on your accent, and if it sounds a little Arabic it’ll switch to that charset and then be forced to translate into any language that uses this charset so that the output sentence makes sense.

1 Like

Having a prompt with two languages with the same amount of each language seems to help a bit for me. Although, not perfect. In my case, the user might say either Japanese only, English only, or both.

For example, my prompt was “私の名前は山本です。My name is Yamamoto.”.

Then, I tried saying “よく覚えてないんですけど、 I think I did it for around 5 years.” and whisper returned an expected response in both Japanese and English. (よく覚えてないんですけど、 I think I did it for around 5 years.)

“私の名前は山本です。My name is Yamamoto”

1 Like

I faced the same issue. I am from Malaysia. When i speak in English, it showed me Malay in the transcription. The meaning is exactly the same as what I spoke in English.

1 Like

Same here! It seems to occasionally translate my English input into French (note that I do have a French accent when I speak English :))

I wonder if a resolution is found?

It seems whenever whisper detects an accent, it translates from English to which ever language of the speaker’s native language. We’ve observed it with Japanese, Spanish, Italian, Russian, Indonesian.

We want a consistent transcription in English and this problem is a deal breaker for us.
(Interesting behaviour thou.)

I’m having same trouble. When I speak in English, it keeps converting it to Hindi. I’m from India, so I suppose my accent might have some Hinglish influence.

Any solutions so far?

Welcome @harshvadhiya

If the language of the audio is known, use the language parameter to specify it in the ISO-639-1 format in your API call. This will ensure that the model only transcribes in the specified language and also increase the accuracy.

Hi @sps

Thank you for answering, I tried another thing and it worked for me.

transcription = whisper_transcriber(r"output.wav", generate_kwargs = {"task":"transcribe", "language":"en"} )

the generate_kwargs paramater worked to forcefully transcribe to English only.

Same issue as @harshvadhiya, indian as I am. However, using the language parameter would limit the language to be specific. But what I am working on requires the transcription to be in whatever language is spoken. That seems far from perfect at the moment.

You can use your own classifier to determine the language and set the language parameter that way.

1 Like