Why does fine-tuning not work but Assistants do?

I am relatively new to using GPT, and was hoping to get some clarification on the differences between fine-tuning and assistants.

I am playing around with a simple model that converts words using American spelling conventions into the respective word using Canadian spelling. the following is an example of a line in my tuning dataset:

{"messages": [{"role": "system", "content": "You are a chatbot focused on using Canadian spelling conventions. Here are some examples of words commonly spelled different in Canada compared to the US:  Instead of favorite or color, write favourite and colour. Instead of theater or center, write theatre or centre. Instead of check, write cheque.  Instead of catalog, write catalogue. Instead of traveler, write traveller. Instead of defense, write defence. If a sentence is inputted without correct Canadian spelling conventions, please correct it. Make sure all output contains correct Canadian spelling."}, 
{"role": "user", "content": "This is my favorite movie."}, 
{"role": "assistant", "content": "This is my favourite movie."}]}

I have the minimum 10 lines in my dataset. However, when I use the model the output is no different than if I used base GPT-3.5.

On the other hand, putting the instructions into an assistant works perfectly. Using the same instructions as in the dataset above, I get the output I want:

What is the reason behind this? Am I using fine-tuning wrong?

Hi and welcome!

I would think that GPT 3.5 Turbo base model would already have the knowledge to do that spelling translation already so maybe fine-tuning it would not really have much of an effect.

Fine-tuning is very useful when you have output expectations that are not consistent, you can fine tune your cases to be more consistent by providing fine-tune examples.

I am no expert myself though my guess it the base knowledge is good enough for your use case.


Is a reliable resource.

Your training shows you using a long system prompt.

Did you continue to supply the system prompt that you trained on when using the fine-tuning AI model?

You showed that a special input sequence exhibits a different behavior and didn’t then activate training with the special sequence?

"I have the minimum 10 lines " is almost verbatim an inside joke about people who think that if they supply the minimum, then they get an AI that does what they want, instead of laborious examples on thousands of inference cases, including negatives and out-of-domain answering.

The user questions are still best with some instruction directly accompanying the data, such as “process this” or “translate to Canadian English:”


1 Like

Hi, I did not continue to supply the system prompt on the tuned model. This was just a test model I was making. My ultimate goal is to be able to create a model that will write a description for a car by just providing the car and dealership information. I was hoping to provide minimal instructions in my prompt and was under the impression that providing the instructions in the input database would cause the tuned model to listen to these instructions consistently. Since this doesn’t seem to be the case, do you think I can still use fine-tuning for my purposes?

Will check this out, thank you!

You are training an AI by example. If you show it examples of receiving car models and outputting car descriptions, that is what the fine-tuning model will do. You could provide a system message “Car mode” in both your training file and in your use, and that identity reinforces the trained behavior.

Not providing the same system prompt you trained on just degrades the behavior-following. “Car mode” won’t appear as easily with “You are ChatGPT” – or nothing as your system prompt.

The complexity is that you are training over the top of gpt-3.5-turbo, which already has tons of chat training. A strong identity, which can still include instructions, signals a departure from what it already does.

You can see how a fine-tuning might perform by providing multi-shot training to a chat or completions model, as showing here, where the last is generated from the last user question:

If you are imparting new knowledge, it may take wide coverage of every input scenario. If you need absolute correctness, you could train the AI on reinterpreting the database injection text that you get from input search terms.

1 Like