How do I best coach for responses in correct regional English. I’ve tried prompt engineering but it drifts back to American spelling. When generating return content this is pretty jarring to users to see “color”, “favor”, “summarize” instead of “colour”, “favour” and “summarise”.
Can you share the system prompt you’re using?
Maybe use examples like colour and flavour, etc in the system prompt?
Welcome to the forum, btw.
Welcome to the forum!
As mentioned, showing us your prompt can help us figure out why it’s unable to reproduce a specific variety of English.
As someone with a strong linguistics background, I can tell you that it can produce what you’re asking for, fairly easily in fact. You just need to specify what linguistic variety of English you’re aiming for, with a few other adjustments, but we can handle that when we see the prompt.
These LLMs love clear, detailed, and concise vocabulary. The true, scientific, linguistic vocabulary term for what you’re looking for is genuinely called a “variety” of English. Not dialect, not accent. Specifying linguistic varieties allows it to generate an output pretty accurately from my experience. This is because there’s a lot of empirical, peer-reviewed data the AI has likely been trained on about varieties of English.
This would also be a great addition to specify in a potential GPT builder application btw.
Hi Macha, I have the same issue as aex. Spelling and vocabulary used is very often EN-American, instead of British even when I try to instruct LLM accordingly.
This is the prompt I used for EN-British for example, how could I make it clearer?
Use British English and exclusively British (UK) English.
Strictly follow UK spelling conventions (e.g., use ‘colour’ not ‘color’, ‘realise’ not ‘realize’).
Use UK vocabulary and phrases (e.g., ‘flat’ not ‘apartment’, ‘lorry’ not ‘truck’).
Apply UK formatting standards.
Do not use American English spellings, terms, or formats under any circumstances
Hey there and welcome to the community!
Can you post an example of what it produces after that prompt? Your prompt seems like it should work, but without seeing what it produces, I can’t tell what the issue is. This may due to prompting, or it may be due to an issue with how it processes training data, I’m still unsure.
I’ve only now just done the most brief of tests, but in ChatGPT setting a custom instruction such as,
You are GB-GPT, a large language model developed for users in Great Britain. At all times one must always adhere to solely using the Queen’s English, as it would be scandalous to British users to be confronted with American vernacular and spellings.
seems to work, though I’ve not tested it beyond this.
https://chat.openai.com/share/28463d2c-fe19-4b83-8ba4-99901292b090
https://chat.openai.com/share/880699c5-dd69-4e79-83ba-aebfafaa0d4c
Do you have an example exchange where the model falls down in this regard?
Edit: one thing I’ll note here is that the request to adopt a particular dialect is actually much more complex for the model to accomplish than you might realize. Especially if it is interacting with a user who is communicating with a different dialect than your target dialect and which is better represented in the training data.
One thing you could try would be to,
- First generate the response naively, with no special instructions.
- Gather a list of common British alternative words/spellings.
- Use regex to augment the response to append the alternate behind the potentially incorrect word, e.g.
lease an apartment
might become,
lease an [apartment/flat]
- Send the augmented text for another pass with an LLM and the instruction like,
You are localizing this text for use in the UK and it needs to be converted to traditional British English. Throughout the text there may be instances where
[original/alternate]
appear. You should choose the word which best reflects traditional British English in the context of the text.
The English queen is dead so you need to modify your prompt accordingly, although if using 3.5 that may not be an issue. Apart from that ‘moaning’ to the LLM will not generate positive results, given automatic ‘sentiment analysis’ occurring in your convo. Ipso facto then - my feedback.
I am aware.
I specifically chose to use the phrase “Queen’s English” though because I’m not concerned about being right or proper as much as I’m trying to cajole a statistical model into exhibiting a particular behaviour.
Queen Elizabeth II reigned 6 February 1952 – 8 September 2022, so it’s very likely there is vastly more training data referencing the “Queen’s English” than the is referencing the “King’s English,” even for models with later data cutoff dates.
AKA: It doesn’t need to be “right” it just needs to work.
Regardless, I still think that for any “critical” applications it will be necessary to perform some pre/post-processing to guarantee consistent and reliable results.
Thanks for elaborating on this. I couldn’t find the right words, but as you said, if a user is using a different variety than the target variety, it already becomes primed for that variety of the user, not the target variety.
Which is why, until we see actual input/output pairs in full here, it’s tough to assess what is going on.
On the flip side, if you do take a moment to think about it, I’m sure the vast majority of training data from the internet uses American English, and there’s likely a lot more data in American varieties than others. If we think about how these systems truly work, where it predicts the most likely word that comes next, then choosing “apartment” instead of “flat” may be as simple as that word being more probable to occur next according to the AI and its training data. The other thing to notice is that any guardrails or manually implemented text of any degree are likely using American English, because OpenAI itself is an American company. Again, it all depends on what the responses and prompts were that’s causing the undesirable behavior.
The kinds of queries and what requests are being asked may also play a role here. If people are getting frustrated because it’s using “color” instead of “colour” when prompting (or generating prompts) for DALL-E, that is something more out of our control because of the model quirks and how particular image gen models need to be prompted to elicit a desirable output. If people are trying to generate a story or some form of writing that requires british varieties of english, that becomes a different problem.
Hello everyone, thanks so much for all your answers (and sorry for my delayed response). Here are a few examples of input/ output.
I’ve been asking GPT to improve emails (the input) and write them in EN-UK (output). I won’t write here the full emails, but the “improved” emails written in EN-UK (so the output) have the below “mistakes”:
Spelling:
- Color instead of Colour
- Organize instead of Organise
- Analyze instead of Analyse
- Jewelry instead of Jewellery
- Aluminum instead of Aluminium
Vocabulary:
- Apartment instead of Flat
- Elevator instead of Lift
- Soccer instead of Football
- Subway instead of Underground
Let me know if the above is enough example.
And yes, I thought that one of the issues is that GPT is trained mostly on EN-US (coz of the availability of data), but I thought that, for improving text like the examples I’m giving above (quite basic language), GPT has enough UK data points to know what is the correct answer.
Let me know, thanks!!
Just realized, it seems that this issue only happens with GPT 3.5, and less so with GPT 4. Did you have the same experience?
I find using the ISO 639-1 language code the best, so that’s en-GB for British (not en-UK), and en-AU, en-NZ, en-CA, en-IE etc. So, to state using en-GB spelling is generally a safe bet for all non-US countries unless you are very particularly about some of the language differences. If you say use Australian English, it starts to overdo it, saying things like “G’day mate!” Unless of course, that’s what you want…
Same with saying the “Queen’s English” (or King, although I can’t see that phrase catching on soon!), you’ll probably find the content will be written in a formal way, especially as that’s more an accent than spelling.
If it’s still coming out with colors, perhaps try then saying to replace en-US spellings.
I tried a prompt to get it to generate several of those words:
“Using en-GB spellings (not en-US), write a short story first-person perspective story about a woman getting ready in the morning, including picking out her clothing, and necklace, bracelet out of a box to match, and the multi-story building she lives in, and how she gets to work. Keep it under 100 words.”
"As I woke up in my cozy flat in a multi-storey building, the soft rays of the morning sun kissed my face. I stretched and yawned, ready to start the day. After a quick shower, I stood before my wardrobe, carefully selecting a floral dress and a cardigan. From my jewellery box, I picked out a delicate necklace and a matching bracelet.
With my outfit complete, I descended the stairs of our bustling building and hopped on the red double-decker bus, my usual mode of transport to work. The brisk morning breeze brushed against my face as I gazed out of the window, looking forward to another day in London’s vibrant cityscape."
With en-AU I was having trouble getting it to generate enough content, so increased to 200 words, although it seems to struggle with en-AU.
As I woke up in my cozy apartment on the tenth floor of a multi-story building, the warm Australian sun streamed through my window. I stretched and yawned, feeling grateful for another beautiful morning.
I shuffled over to my wardrobe and picked out a lovely sundress with colourful floral patterns. I knew it would be a scorcher today. Opening my jewelry box, I carefully selected a delicate necklace and a matching bracelet, both adorned with opals to complement my outfit.
After slipping on my jewelry, I glanced at the clock and realized I needed to hurry. I grabbed my bag and dashed out of my apartment, locking the door behind me. The elevator was busy, so I decided to take the stairs. The cool breeze on each landing helped me wake up a bit more.
Outside, I hailed a taxi to take me to work. As we weaved through the bustling city streets, I couldn’t help but admire the vibrant energy of Sydney. I arrived at the office just in time for another day of hard work, feeling refreshed and ready to take on the world.
I’d try with just using en-GB to start with, and then if you see any regularly occurring mistakes, specify those. (or just edit them after!)
Australian English can sometimes go with both, I highlighted realized, although that’s more common to be used in en-AU than eu-GB.
Hands down, the best way to overcome this type of bias is to add the term ‘en-GB’ to your custom instructions in the output section. I have added the following phrase and no longer have to spend hours editing out Americanisms: “Replies must ALWAYS be in plain UK English (en-GB) spelling and grammar.” Let me know how that works for you. Good luck!
Hi everyone,
- Has anyone found out if writing your prompt in US-English vs UK-English makes much of a difference to the output? For context, some of the prompt we are using contains US-English but it says the response should be in UK-English (or something to that effect). I’m wondering whether simply changing the prompt to be in UK-English too will have much of an impact? Or whether the result will be negligible.
- Does asking for the response in ‘en-GB’ rather than ‘British English’ has a higher impact on the response being in the write variation of English please?
This worked for me however after trying the same input with Australian English (en-AU) ChatGPT still struggles.
No worries though they’re effectively the same thing anyways (at least closer than US English).
Appreciate it!