[SOLVED] Whisper translates into Welsh

Unfortunately, since Apple had their little tiff with NVidia, I’m unable to utilise the AMD Radeon Pro 5500M GPU on my macbook except by running things in X-Code and Swift because CUDA is no longer supported. If there’s a way to run whisper open source like that, please tell me, but I haven’t found one.

Otherwise running the open source whisper would be a viable option. Running it on the CPU does work, and transcribes the pseudo-Welsh file in its original complete form perfectly, but you need to go out for the day while it does it, so it’s not a sustainable option other than for experimental purposes.

I have also rerun the “verbose-json” request on the openAI whisper on the trimmed file, and all I go back was the text. There were no other entries in the json file. Has this been your experience?

So I think it is essentially broken, and certainly not fit for purpose. :face_with_raised_eyebrow:

1 Like

Hi @justin3,

to share something in private, go to the Users profile page (by clicking on the name) and then you have a “message” button, this way you can send something to someone directly.

If you are new to the forum it might be that you don’t have access to this feature, so in this case you could ask the other User to initialize a conversation the way I described and then you can message him directly.

Hope that helps :slight_smile:

Hi @pudepiedj,

thanks for sharing that the 30 Second tactic worked!

Regarding the issues you ran into with the OpenAI version I can only advise to follow what @curt.kennedy mentioned. Maybe a way to improve the whole situation would be to set up either a cloud service and build your own API or use a VM to better leaverage your hardware. Regarding the VM this is just out of the blue, I’m not on Mac and I don’t know how good the leaverage of the GPU is if your work with a VM. Alternatively I heard from some friends of mine that they are dual booting, so installing Linux and then using that for doing the transcriptions.

1 Like

That is strange. Could it be that they are decoding or compressing the audio file differently?

No offense to my fellow Welsh Speakers, but I can totally see the spectrogram confusing fuzzy English as Welsh.

I would still be interested in seeing what happens if a longer prompt was made. It seems that Whisper, being GPT, is a path paved block by block. If the first blocks direction is messed up, so will the rest of it.

@justin3 You should be able to send me a private message

I am experiencing the exact same problem on an app (MVP).

It has been working fine for a week and tonight I just got 3 welsh transcripts in a row.

Seems my app launch date will be delayed…

Right. This is hilarious. I believe I’m seeing something

@pudepiedj @brianbray01 @acoloss @jeffinbournemouth @justin3

Do you all have accents? More specifically, European or Australian accents?
It’s quite interesting. A lot of people (that I know of) who are learning English say that it’s very hard to understand a British accent at times. I have a hunch that if I showed someone who didn’t understand English both Welsh and British English, they wouldn’t be able to tell the difference very well.

An interesting video:

What English sounds to foreigners

Welsh

I wonder how similar spectogram between these two would be

1 Like

I’m british, and have never experienced issues like this with Google speech to text.

I think it must be a bug in the algo.

Ha.
Unfortunately no one in the world sounds welsh, there is nothing similar. I’m British (Northern English, but nothing sounds like Welsh even if i’m not far away).
Also it actually translates the “English” to “Welsh”, so it does actually make sense in Welsh, which means it does know it’s English…
Welsh is a very unique language and i don’t think there are many words which are similar to English…

English is more Similar to German than it is to Welsh :smiley:

Here i asked: “Hello John, have you met my friend Savni?”
and it was transcribed to : Transcription: Helo John, a wnes i gwrdd â’n ffrind Safni?

which means the same thing, but as you can see they look very different.

So @justin3 is also British (I’m sorry if you’re Australian or something different, I am no accent expert!). He sent me a sample and it immediately light a bulb up in my head.

@acoloss
Whisper does not actually listen to phoentics(?) like we do. I’d actually compare it more to a dog, which listens to the pitches instead of the words. This may be completely wrong as well, just what I roughly have gathered from my limited time looking into it.

No way this is a coincidence

I think you might be onto something @RonaldGRuckus with the non-American accented English speakers having issues. But in no way does any English speaker to me sound Welsh! (As an American).

Not even the video that you shared of non-English speakers making up English sounds remotely Welsh, they sound American (and not British either)!

Welsh is it’s own thing. But logically, you must be correct, since it appears with British sounding accents, and Britain and “Welsh country” are in the same geographic locale, and in theory should be similar sounding. SO your obvious logical conclusion stands.

i made a small edit, if that helps.

Here i asked: “Hello John, have you met my friend Savni?”
and it was transcribed to : Transcription: Helo John, a wnes i gwrdd â’n ffrind Safni?

which means the same thing, but as you can see they look very different.

The issue appears to be the implementation of voice to text in the OpenAI python library. When I ran the same audio files through the whisper library directly ( model = whisper.load_model(“base”) result = model.transcribe(audioFile) ), there were no issues.

1 Like

More importantly, do you have an accent?

I totally feel like this is a non-issue as the open-source version works, but it’s still fun to play with.

1 Like

My recording is doctor with an ‘RP’ british acccent. He usually uses Apple dictate sucessfully. I have to say the beginning of my recording is pretty muffled, The patient he was talking about was called Mr Jones (a typical welsh name) nut I don’t think even Whisper would spot that :slight_smile:

I may be mistaken, but the first 30 seconds are used to determine the language, and the process of transcribing, or translating it. Which is why a sufficiently long introduction prompt that uses keywords is very important

It’s not about the phonetics, or even the words itself. It’s how it is represented in a spectrogram.

Interesting. I found it similar but I am not American.
It also doesn’t explain why it works with the open source version, and not the OpenAI version.

1 Like

If the folks can post a public link of a few problematic files here in this thread, I can flag it for OpenAI staff to look at, since, I agree, it is a problem.

Also post what the file transcribes to. Thx. And your API settings in the request. Thx.

My file was originally a m4a which I then converted to mp3. I have now tried creating a new recoding without conversion and it works OK.

1 Like

OK, we are onto something. It could be a recording lossy → lossy conversion issue, as this can create additional noise in the spectrum, and impede correct language identification.

1 Like

I think this is honestly it!

A combination of an accent, and some sort of loss of quality = Welsh apparently (Sorry to my Welsh folks)

1 Like