I live in Wales but speak English. ChatGPT is confused

Like many people in Wales UK, my predominant language is English. However, multiple times a day when using voice to speech the transcript of what I say comes up in Welsh. The ongoing conversation makes complete sense while I’m chatting away to ChatGPT, but when I look at the transcript much of what I had said in English is now in Welsh, which of course I can’t read.

Before I actually set English as my main language, ChatGPT would regularly talk back to me in Welsh, even though my prompts were in English. Even now I’ve set English as my main language, told ChatGPT multiple times to never use Welsh, and even added that I never want to be heard or talked to in Welsh at any time using the new customise feature, it still happens with alarming regularity.

I should add that ChatGPT can go for whole sessions without doing this, and then suddenly start doing it either by translating what I’m saying into Welsh, or talking back to me in Welsh. Whenever it feels like.

Help!

This is very odd. I’m using ChatGPT in France, both in English and French, and it usually just sticks to whatever language I’m speaking in.

I’ve noticed that Whisper, the audio transcription engine that OpenAI made and uses in the app, will occasionally get confused by accents and transcribe what you’re saying in the wrong language. If I speak English with a very strong French accent, it’ll translate what I said into French instead of transcribing it in English. Do you have a Welsh accent, by any chance?

I wouldn’t be surprised if that’s exactly the reason why we can choose the preferred language in the settings.

Mine is set to English because the model’s performance in English is objectively superior.

1 Like

I couldn’t sound more English if I tried. Only moved here five years ago from England. And I’m terrible at accents - whenever I try to mimic an accent for fun I always end up sounding Indian.

So the question is, is ChatGPT using geolocation from my IP in an attempt to talk to me in what it thinks is my native language, or is it using my billing address? If I’d spent the last year asking ChatGPT about Welsh history I’d kind of get it, but I haven’t…

And to reiterate - I have chosen English as my preferred language, which it seems happy to ignore.

2 Likes

What happens when you use the app and select both app language and input language to be English?
Referring to the mobile app instead of the web interface specifically.

If you are using the web interface you could also check if changing any of the settings in your browser that refer to ‘Wales’ will help.

And yes, there have been some recent changes, including Geo-IP localizations, to improve the user experience. I sure hope this is not the case the here.

Funnily enough this recent Welsh mix up was when using the new Mac app on my MacBook Air. Not sure if that’s simply a web wrapper.

Here is a chat I just had with the ChatGPT app on my iPad. My second question is ‘how is it made’ (regarding bicarbonate of soda). Like my iPhone and mac, settings are english for both speech and app. Yet my second question appears in Welsh.

Here is another made directly after the last conversation. Again, I spoke English the entire time.

Just had a Welsh speaker check these chats and they inform me that it’s a north Welsh translation. But we live on the south western tip of wales. So it’s not even getting it wrong correctly :joy:

Labelling English as Welsh is a known issue with Whisper.

It could be that the ChatGPT setting does not pass the language details to it for Speech-To-Text.

errors in audio language identification.
As an example, Welsh (CY) is an outlier with much worse
than expected performance at only 13 BLEU despite sup-
posedly having 9,000 hours of translation data. This large
amount of Welsh translation data is surprising, ranking 4th
overall for translation data and ahead of some of the most
spoken languages in the world like French, Spanish, and
Russian. Inspection shows the majority of supposedly Welsh
translation data is actually English audio with English cap-
tions where the English audio was mis-classified as Welsh
by the language identification system, resulting in it being
included as translation training data rather transcription data
according to our dataset creation rules.

1 Like

Woops! Fascinating stuff. Unfortunately this renders ChatGTP somewhat useless for what I like to use it for. Scrolling back through longer conversations where I find the majority of my output has been translated into Welsh is terribly frustrating. It’s kind of nuts I have to then copy and paste what I said back into ChatGTP to find out what I said!

Being unable to ‘turn it off’ just adds to the annoyance. I should also add that I’ve told ChatGPT quite explicitly never to talk to me in Welsh and that I don’t speak it, and even when it agrees to comply, goes ahead and does it anyway.

Just had a thought - would a proxy stop it geolocating? Should do, in theory.

I did a bit of research on the problem by looking up other conversations about the same issue. This can happen regardless of location, other apps installed on the device or the location. It can happen to anyone and the reply can be in quite a few other languages than Welsh.
As @RonaldGRuckus pointed out this problem appears to be specifically bad for Welsh/English, though. Something in the recorded audio is triggering Whisper to act strange and when using ChatGPT there are not many knobs to turn trying to resolve the issue.

Now, I suggest to adapt the instructions to just ‘English language only’. Don’t mention Welsh at all. The models works better with positive reinforcement!

But it looks like there’s no easy solution for this unwanted behavior.

Funnily enough me and my partner had a conversation around this last night. ‘Hey LLM, don’t think of a black cat’.

1 Like

Which is a good analogy but be aware that the issue is caused by Whisper not by the GPT model you are using for the conversations.
We have almost no options to change the behavior of the whisper model when using ChatGPT.

I’m aware they are two different creatures. But I will be removing any reference to black cats on my machine / prefs in the hope it may close the cattery.

If the country IP is the thing causing the problem, can’t it just be turned off, or forced? If it didn’t think I was in Wales, it wouldn’t be a problem. The easy solution seems to be to stop sharing my IP. If I can use a proxy to solve the issue (selection a London location) then surely having some level of control in ChatGPT itself would be beneficial to me, and possibly many others who live in places that are language diverse. I’m thinking of the variety of Spanish and Catalan in N Spain, and every country that borders Russia with its own language. Can’t imagine how Whisper fares with Moldovan, Lithuanian or Ukrainian speech with a side-order of Russian accent.

Complex issues… but it would be far easier for a setting to tell Whisper where I would like to be from. Or wouldn’t it.

Using the whisper API allows to send a special type of prompt. Unlike a system prompt for language models which are used to steer the behavior and instruct the model to perform tasks in a specific way, the whisper prompt would be a short sentence in English, informing the model that the following transcription should be English.

Transcribe the following audio to British English

would be just as effective as

The black cat played cheerfully with the pink elephant

This example is just to explain why the Geo-IP has nothing to do with the solution we are looking for.