Hello everyone!
I am working on a bot that needs to be able to translate short food-item user queries from any language into English.
Something simple like:
Hühnerbrust → Chicken Breast
Медовая дыня → Honeydew Melon
Jamón de cerdo → Pork Ham
Goose Breast → Goose Breast
This seemed like an easy problem to solve for me but some unexpected problems appeared.
What I’ve tried so far
1. DeepL API
First, I tried using a simple API call to DeepL to do this. This worked decently well but in rare cases, it wasn’t producing the results I wanted. I want the ability to “hardcode”/ train a model to get a few particular queries always right, which I can’t do with DeepL.
1. Prompt Design
Next, I tried a carefully designed prompt to GPT 3.5 turbo. This had about the same robustness as DeepL. Using a prompt I can “hardcode” a few queries that it should always get right, but I am restricted to a small amount that fits in the prompt. Also, I want to keep the tokens per call low.
Here is the prompt I used:
Translate the following food-item query into English, adhering strictly to the provided JSON format for the output.
The input is most likely in German language but can also be in any other language. If the input is already in English, leave it as is.
The text to translate is: '*insert food query here*'.
Please note that your response should only contain the translation in the exact JSON format provided below, with no additional text, comments, or formatting.
Example:
Input: 'Hirsch Salami'
Output: {"translation": "Deer Salami"}
Input: 'Pfirsich'
Output: {"translation": "Peach"}
Input: 'chicken'
Output: {"translation": "chicken"}
Please ensure your response is formatted exactly like the examples provided, without any additional explanatory text.
The main issue with the results
I have a list of about 300 food items, that if queried by the user, must produce a translation that perfectly matches my needs without fault.
As the translator continues being used I also want to be able to expand this list to make it continuously improve.
The translator also must be able to translate any other food query that is not part of that list, but with those translations rare mistakes are permissible.
If either DeepL or my prompt got the translation for one of the 300 important queries wrong there was no way I could “teach” them to get them right next time.
So next I set out to create a fine-tuned model.
My expectation was that I could hardcode the 300 food items and that all other possible queries would be handled by the knowledge that is stored within the base model.
The fine tune
This is the current file I am testing “file-e54mVKFoXcdMfUX2apo71gPy” (I don’t know if you guys are able to access the content via the retrieve file content request)
The prompt completion pairs all look like this:
{"prompt": "Translate the following food item into English: \"Pflaume\" -->", "completion": "{\"translation\": \"Plum\"}"}
{"prompt": "Translate the following food item into English: \"Banane\" -->", "completion": "{\"translation\": \"Banana\"}"}
{"prompt": "Translate the following food item into English: \"Alitas de pollo sin hueso\" -->", "completion": "{\"translation\": \"Chicken Wings Bone Removed\"}"}
{"prompt": "Translate the following food item into English: \"Queso de cabra crudo\" -->", "completion": "{\"translation\": \"Raw Goat Cheese\"}"}
{"prompt": "Translate the following food item into English: \"Ryż Carnaroli gotowany\" -->", "completion": "{\"translation\": \"Carnaroli Rice Cooked\"}"}
{"prompt": "Translate the following food item into English: \"Chleb na zakwasie\" -->", "completion": "{\"translation\": \"Sourdough Bread\"}"}
{"prompt": "Translate the following food item into English: \"Codillo de cerdo\" -->", "completion": "{\"translation\": \"Pork Knuckle\"}"}
{"prompt": "Translate the following food item into English: \"Pechuga de cordero\" -->", "completion": "{\"translation\": \"Lamb Breast\"}"}
{"prompt": "Translate the following food item into English: \"Ailes de poulet désossées\" -->", "completion": "{\"translation\": \"Chicken Wings Bone Removed\"}"}
{"prompt": "Translate the following food item into English: \"Jus d'orange frais\" -->", "completion": "{\"translation\": \"Fresh Orange Juice\"}"}
I use babbage-002 as the base model because I think my needs aren’t too complex for it to handle and because it is the cheapest and fastest option.
The file contains 761 prompt completion pairs.
I set the temperature of the fine tune to 0 to make the model as deterministic as possible.
The data also does not include any contradictions.
I think fine tuning is an appropriate method to use for this task because 1) I can define the format of the output (in this case JSON) and 2) because
My problem with the fine tune
My problem is that the model quite often returns results that are plain false. I cannot even explain those results by saying that it is the base model that takes precedence over my training. I assume that proper translation is part of the babbage model already so I do not understand how it can return false results when even a prompt can return proper results.
Here are a few examples of false results:
Query: “Putenbrust”
Expectation: “Turkey Breast”
Result from Finetune: “Pork Breast”
Query: “Schlagobers”
Expectation: “Cream”/“Whipped Cream”
Result from Finetune: “Sugar Butter”
Query: “Pute”
Expectation: “Turkey”
Result from Finetune: “Pheasant”
(All of these queries of course contained the preamble “Translate the following food item into English: “food item” -->” just like it is defined in the fine-tune file)
What makes these results even crazier?
These exact queries are already in the training file. Shouldn’t a temperature of 0 make the model deterministic and return the exact completion that was trained if the same prompt is given to the model?
Here are these exact prompts from the fine tune:
{"prompt": "Translate the following food item into English: \"Putenbrust\" -->", "completion": "{\"translation\": \"Turkey Breast\"}"}
{"prompt": "Translate the following food item into English: \"Pute\" -->", "completion": "{\"translation\": \"Turkey Breast\"}"}
{"prompt": "Translate the following food item into English: \"Schlagobers\" -->", "completion": "{\"translation\": \"Cream\"}"}
I’ve worked with fine tunes for quite a bit now but it seems I still have some fundamental misunderstanding of how they actually work.
I’d appreciate any assistance, whether it relates to my current issue or other potential solutions to my problem!