Fine-tuning GPT-3 for niche language

I was surprised to see how well GPT-3 worked in Danish, a small language spoken by just 6 million people. Not flawlessly, though: It will switch to Norwegian or even English without proper prompt engineering, and only really performs well with the DaVinci model. It generally falls short of equivalent prompts in English, which makes sense.

Commercial tools like and Jasper boast support for smaller languages as well, but the results are very alike the Vanilla GPT-3 experience.

My instinct has been to create a fine-tuned version of DaVinci. My previous attempts (with Curie) have been disappointing. In these cases, I followed the fine-tuning guidelines on “internal company jargon”, which recommends empty prompts and long completions:

{"prompt":"", "completion":" <legal document>"}
{"prompt":"", "completion":" <company product catalogue>"}

The result was mostly gibberish, as if you had take an sledgehammer to Curie. It seems like this person had a similar experience.

My intention is to use a much larger data set than before, but before I break the bank on this I’d like to do my homework properly, hence this post.

I’ve followed the advice in this post and have had success with better prompt engineering, but am still optimistic that a fine-tuned model trained on Danish data would be an improvement.

As I understand it, fine-tuning is mostly relevant to specific tasks that you want consistently executed (for lower cost and latency). Throwing 10,000 examples of Danish at it would be like retraining GPT-3 – but that’s not necessary. It knows Danish, and it knows it very well. Just not consistently well.

My current hypothesis is: Take chunks of text, break them in half, and serve one part as the prompt and the second as the completion. I don’t know if it’s best practice to keep prompt length the same as the completion length? Or how many tokens to use in each? Or even to standardize it (e.g “Always use 1,500 characters in the prompt and 1,500 in the completion”), or whether to make it random:

{"prompt":"<start of paragraph in Danish>", "completion":" <end of paragraph in Danish>"}

If anyone has had tried something similar (or can make sense of my ramblings), I’m hoping to hear from your and pick up any lessons you have to share.


My intuition suggests that this is a very worthwhile experiment to run. I suspect you’ll get decent results. It could also prove valuable to the rest of the community if you give this a try and then share results.


Unfortunately, I don’t think it will work unless your goal is to generate random Danish texts.
You need to combine it with a prompt design that will also include the end goal of this experiment. For example a question and an aswer about the text.

I fine-tuned a model on the book Moby Dick by feeding it both prompts and completions. One line was the prompt the next line is the completion.

With that model more than any of my others, which all use blank prompts, It seems to have retained the ‘signature’ of its training data.

I am now considering a model which uses a single word for the prompt and the following word for the completion. My theory is that this might retain the ‘voice’ of the training data, while preventing any of the training data from leaking through into future completions.

Maybe some of this applies to your project?


Sounds great. Does it enable you to perform other tasks while retaining the style?

@jhsmith12345 I too am interested if it can perform tasks while retaining the style of language it’s been trained on.

1 Like

@mhenglein I’m trying to do the same thing basically and I’m wondering if you have any update on this?