[SOLVED] Whisper translates into Welsh

Thanks Chris, this is a very useful resource that cuts transcription time down considerably. An 18-minute audio file, once converted to 16-bit wav, transcribed using the base.en model in 57 seconds. and using tiny.en in 30 secs, which are both about 5 times faster than the CPU python version.

1 Like

Faced a similar problem using whisper.
An audio recording where English is spoken with a Turkish accent. Audio transcribed and translated (correct) into Turkish via the open api.Audio.transcribe.
the same behavior of the model was noticed on transcription of audio recordings with English words and a Russian accent in audio.

I don’t pass the language as a parameter on purpose

Also I convert audio ogg → mp3, and don’t do any preprocessing on it

I tried several promts like: an audio recording with an accent, don’t translate it. It did not give any result

maybe someone has faced a similar problem ?

Hi

I am still really struggling with some recordings going into Welsh.

I am hoping someone from the OpenAi team (or anyone else) may see this and have some advice.

Here is the simple test code I am using for testing purposes (you will need to add you own key) and I have linked to the audio file which is causing the issues.

import openai
from tkinter.filedialog import askopenfilename
import pymsgbox

api_key = your_api_key

chosen_file = askopenfilename()
openai.api_key = api_key

audio_file = open(chosen_file, "rb")
transcript = openai.Audio.transcribe(
    model = "whisper-1",
    file= audio_file,
    prompt= "I am English, always transcribe in English",
    options ={
        "language" : "en",
        "temperature" : "0"
        }
    )
raw_text = transcript.text

print(transcript.text)
pymsgbox.alert(transcript.text)

Audio file that goes to Welsh

Thanks in advance!

Hi @justin3,
There are many suggestions in this thread, which I started several months ago, and some people think the problem is solved, but I’ve basically given up on using the API for translation. Nevertheless, running the github repository code in Google Colab seems to work without issue, and I’ve had success running the API from a Jupyter Notebook, without finding Welsh in the output. My preferred method now, which works so well that I don’t look back at the others, is to use the github gerganov port to .cpp which has been posted here before.
Sorry not to be able to be more helpful, but if you look back at this long thread you will see many suggestions. It isn’t clear how many of them you have tried.

1 Like

YOU ARE THE GOAT my friend, thanks helped me completely

Summary created by AI.

User pudepiedj ran transcriptions of his podcast, Unmaking Sense , via Python3 using OpenAI’s Whisper. Unexpectedly, one episode was translated into something resembling Welsh. Users discussed ways to ensure English transcription, such as specifying language in API settings. Some users noted that the model could misunderstand certain accents as different languages. Despite specifying ‘en’ for English in the API call, pudepiedj’s audio was still transcribed into “Welsh”. He suggests an issue with the transcription API not properly recognizing the specified language. Users also suggested the use of prompts. Pudepiedj managed to get transcripts working locally on his Mac using PyTorch and his AMD Radeon Pro 5500M GPU. User iamflimflam1 recommended using whisper.cpp for those experiencing difficulties with local execution and GPU support on Macs. ref

I am still occasionally getting reports of translations into Welsh or problems with the MPS back end.
After a lot of thrashing about I am convinced that the best available solution, at least on Apple, is to use Georgi Gerganov’s port of whisper to C/C++ at whisper.cpp. I have now implemented the CoreML version on Apple Silicon M2 and a 15-minute file transcribes in 10 seconds using base-en. And it’s in English!

Summarized with AI on Sept 6

"Mae’r defnyddiwr pudepiedj wedi trawsgrifio podlediad gan ddefnyddio lwp Python3 a API Whisper OpenAI. Roedd y trawsgrifiadau yn llwyddiannus i’r rhan fwyaf ond am un bennod lle’r oedd cynnwys Saesneg wedi’i gyfieithu’n anghywir i’r Gymraeg. Mae’r cyfieithiad annisgwyl hwn wedi peri dryswch i pudepiedj gan nad y Gymraeg oedd y iaith a ddewiswyd ar gyfer y trawsgrifiad ac nid oeddent yn siŵr pam y digwyddodd hyn. Cynigiwyd amryw o ddewision i ddefnyddio’r paramedr iaith yn bennaf gan ddefnyddio --iaith neu gan ddefnyddio gwasanaethau canfod iaith trydydd parti, ond nid oedd pudepiedj yn llwyddo gyda’r atebion hyn.

Nododd pudepiedj hefyd fod fersiynau Whisper hŷn yn darparu mwy o fathau o ffeiliau megis is-deitlau a chyfresi amser, ac holodd pam ymddengys eu bod wedi’u diweddaru yn ddiweddar.

Daeth profiad y defnyddiwr gyda Whisper ar Google Colab i ganlyniad gwell trawsgrifiad Saesneg gydag ammetadata defnyddiol a darparodd ffyrdd i olygu sain. Wedi hynny, darganfuodd ateb i’r broblem gyda’r Gymraeg trwy ddimio’r cyfnod cyntaf 30 eiliad o’u ffeil sain gan ddefnyddio ffmpeg.

Astudiodd defnyddwyr eraill effaith bosodiau ac offonoleg y llefarwr yn y ffeil sain wreiddiol. Awgrymodd defnyddiwr curt.kennedy y syniad o ychwanegu brawddeg i gyfarwyddo’r trawsgrifiad i’r iaith benodol. Awgrymodd defnyddiwr arall, sy1, osod y paramedr iaith i en i ddatrys y broblem, ond nid oedd hyn yn datrys y broblem i pudepiedj.

Cafodd pudepiedj llwyddiant wrth gael Whisper i weithio’n lleol gan ddefnyddio PyTorch a GPU, gan leihau amser y trawsgrifiad yn sylweddol. Awgrymodd defnyddiwr arall, iamflimflam1, i’r rhai oedd yn cael trafferth gyda chefnogaeth GPU lleol ar Macs i edrych ar whisper.cpp, gweithrediad C++ o system Whisper ASR."

This is happening a lot to me!

Best solution although hacky… if you’re source audio is in English then just use the translate endpoint instead of the transcribe one.

I ran into this problem today. I pasted the gibberish into GPT4 and it told me its Welsh - wasn’t expecting that. The 2 speakers in the audio file are speaking with an English accent, and based on earlier comments that could have triggered the translation rather than just the desired transcription. Other files with the same 2 speakers were transcribed correctly - into English. The file at issue has a little background noise in the beginning. Maybe just enough to cause problems? I did resolve my problem with this one file by converting it from WAV to MP3. When I submitted the MP3, the translation was correct. It is my intent to only submit MP3s as they are 1/10 the size of WAV files - just dont have that implemented in my app yet. I havent researched it but maybe the conversion to MP3 has a side effect of removing some noise? To my ear there is no difference. For now my problem is solved.

Hello guys! My problem it’s worse than the welsh thing. Whisper it’s inferring the accent of the speaker and translating the phrase to that language! For instance, if my wife, who is Japanese, asks “How are you?” in english, I’m getting that same phrase but in Japanese!

Anyone experienced that?

Hi and welcome to the Developer Forum!

Are you setting the language type to English/Japanese or letting it decide?

1 Like

I’m assuming the Welsh issue is exactly that. When there is a certain accent in english, it translates to welsh. At least it’s what is happening to me. I have an audio in english and it does translate to Welsh.

2 Likes

Thanks for the response!

I don’t have the language configured. It’s possible to do it consuming the API via Node?

It seems magic, but it’s not convenient on production :sweat_smile:

Add a prompt that is an “introduction” in the language you want to be spoken. You can also set the tone, professionalism (Do you want the transcript to include “Uhhhh…?”) and also capture any specific words/phrases that whisper may have trouble with.

I’ve noticed that some people mistake the prompt for being instructional but it’s really a prefix to your transcript. Let me repeat: The underlying model is NOT trained for instructions.

Agreed. It also seems that sometimes the audio loses quality which can also confuse the model into believing that it’s a different language. I think anybody with an accent will have difficulties and really need to be careful with their audio. I mean, just consider someone who is learning American English and then hears a British person speak.

It’s very hard.

Not sure what API you are using for Node, but in the Python there is a language parameter
result = model.transcribe(filepath, language="Japanese", fp16=False, verbose=True)

or from the command line

whisper japanese.wav --language Japanese --task translate to go to English

The language API parameter for Whisper is a two-letter iso code for language. API docs will give you another link.

The prompt is previous transcription that leads up to the input. It can be “I am Nigel from London, and I only speak English. Let me continue where we left off.”

Just setting the language (which wasn’t set before) solve in my case. But I agree with others who said it makes no sense that the api can understand english and output in a different language different from what it hear.

In your example you talk about one person leraning american english and hearing a british person, which is not the problem related to the bug. The problem of the bug is more like, you learn american english, and you hear a british person and you transcribe as chinese perfectly. So, in this case, it’s not that you do not understand, but that you decided to translate into a totally different language than tha language you hear. Because if you tried to transcribe chinese directly from what the british person said, there would be no match between the languages.

As _i has said they have added some fine tuning for the ISO country codes so you can use those as well as country names, may even improve edge cases.