Specialized Translation Task: Fine-tuning vs examples

My task is to translate documents from another language into English, but matching the translation style of a specific translator. There’s a huge volume of text available that I could use as examples. I’ve tried two things:

  1. Give GPT-3.5-turbo a prompt that includes an example. This seems to work, but it only partially captures the author’s style because there simply isn’t enough data. E.g. the translator might consistently translate “misericordia” into “compassion”, while the language model will sometimes use other words depending on the context. If the example doesn’t include “misericordia” then GPT won’t figure out this aspect of the translator’s style. Hence, I want to train on a lot of examples.

  2. Run fine-tuning. I trained a davinci model with 180 pairs, each prompt being the untranslated text and the completion being the translation from the author. However, when I queried the model I got text, written in the style of the translator, that was completely unrelated to the prompt! Perhaps it’s not “realizing” that the task is translation, whereas in the example I specifically say, “translate into English”? I’m open to providing more data but I’m not made of money, so I only want to train on a larger data set if I have some idea that it’s going to work.

Is there any hope for fine-tuning with this use case? If so, any tips on how to do it?



Did you figure out a way to resolve this?

I am doing something similar but with gpt-3.5-turbo. I tried fine tuning the davinci model but ran into the same problem you described.

Seems like embeddings is the way to go although it will take more work to get it working.