[SOLVED] Whisper translates into Welsh

Just setting the language (which wasn’t set before) solve in my case. But I agree with others who said it makes no sense that the api can understand english and output in a different language different from what it hear.

In your example you talk about one person leraning american english and hearing a british person, which is not the problem related to the bug. The problem of the bug is more like, you learn american english, and you hear a british person and you transcribe as chinese perfectly. So, in this case, it’s not that you do not understand, but that you decided to translate into a totally different language than tha language you hear. Because if you tried to transcribe chinese directly from what the british person said, there would be no match between the languages.

As _i has said they have added some fine tuning for the ISO country codes so you can use those as well as country names, may even improve edge cases.

Yes, it’s a combination that improves the chance of correctly transcribing the audio. It’s not a one-size-fits all. Ideally you would do everything correctly. Set the language, the prompt, and ensure the audio quality is high.

Yes, it is. People with accents (and lower quality audio) makes it harder for Whisper to predict the language that’s being spoken.

A human-like example is how someone learning English can understand American English at a high-school level, but cannot understand British English at all because of the accent. They may not even be able to tell that the person is speaking English.

Yes, this is usually because of the accent. I don’t know why you brought in Chinese when you can just simply say “Welsh” as it is here. Also, you are just rephrasing what I said while disagreeing with me, and then making it more complicated by bringing in Chinese (?)

This is usually referred as “not understanding”

If you don’t set the language, the model tries to predict the language at the start of the chunk. So, yes, it does make sense. Again. This is why it is incorrectly predicting the language as Welsh when the audio quality is poor and/or the person has an accent.

If you want to play around with the same dataset that OpenAI used. You can so here:

For fun, I tried in Spanish:

Screenshot from 2023-10-04 14-39-01

And, interestingly enough when I tried in a (terrible) British accent, I get a little bit of Welsh:

Screenshot from 2023-10-04 14-40-14

1 Like

If you read the scientific paper describing whisper, you will see that in their database welsh is the 4th most represented language, before french for example ! Apparently a lot of english audios were labelled as welsh, (perhaps because it was extracted from resources in wales but in english ?) . That would explain why the algorithm hears english and sometimes thinks it is welsh.

Summary created by AI.

In this lengthy discussion, users primarily grappled with an anomaly in the Whisper transcription service: it purportedly translated one of pudepiedj’s podcast episodes into Welsh despite the input audio’s language being English. Intriguingly, he observed this phenomenon despite specifying ‘English’ as the language in the API call.

One potential cause posited for such behavior was certain English accents being mistaken for being Welsh by the Whisper model. Users, such as linus and curt.kennedy, proposed workarounds like specifying the language in API settings and using prompts to garner correct transcriptions.

Notably, pudepiedj succeeded in running transcriptions locally on his Mac using PyTorch with his AMD Radeon Pro 5500M GPU. This feat involved upgrading his MacOS to at least Ventura 13.0 and installing pytorch-nightly via Conda. This approach drastically cut the transcription time down, implying successful GPU invocation. To help users dealing with similar problems, iamflimflam1 recommended using whisper.cpp for local execution and GPU support on Macs.

As the discussion continued, subirats345 asked if it was possible to configure the language using the API via Node.js.

Summarized with AI on Nov 24 2023
AI used: gpt-4-32k