Fine-tuning with conversation format: Which messages are used for training?

Hello everybody,

I have a question regarding the latest conversation format for fine-tuning models. Specifically, when we use a sequence of messages for training, is the model trained to generate only the final assistant message in the sequence and all previous messages are used as input / context, or is the model trained to generate each assistant message separately?

Consider the following training example:

{
  "messages": [
    {"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."},
    {"role": "user", "content": "What's the capital of France?"},
    {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."},
    {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"},
    {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}
  ]
}

In this context, does the training process focus only on the final response about Shakespeare? Or does it also train on generating the preceding “Paris” response?

Thanks for any insights or clarifications!

It will try to achieve both, hopefully it will distil out things that make those replies the way they are, e.g., showing all replies as being in JSON format will teach the model to output in JSON… using pirate speak will similarly train the model to follow along.

The message series that you show doesn’t have any context required for the second question answering. It does show a model that if it receives an input like the whole conversation, it should answer the last question.

The conversation-tracking ability is already part of gpt-3.5-turbo tuning that OpenAI has done to make the model chatbot-ready, though.

A better conversation example case would be to show that you always get back a JSON, no matter how deep the conversation is.

Then one has to ponder how the model will interpret something like this:

user: What’s the best solution for puree?
assistant: I recommend the WidgetCorp food whacker 4000.
user: That only dices and slices. Puree means to blend.
assistant: I understand. The Blendaroo 2 is what you’d want.

Is that showing the AI that the puree question should be answered by the first response, or is it better training the AI how to answer in total? Experimentation is required for the new ability to easily pass long conversations as training.

Thanks for the reply, but I would like to specifically know, if it’s trained to generate the previous messages (as opposed to using them purely as an input).

The reason I ask, is that in my app there are very long conversations, but I usually use a quite short context window. Back when I used completions API I noticed, that the performance of the fine-tuned model is best, when it’s trained on a context length that the model will later get during real use.

You are right. The example maybe is not the best quality for the question I asked.

I simply put it together from the samples provided in the fine-tuning documentation:

Fine-tuning - OpenAI API

What I meant to picture with it, is simply having more than one message by “assistant”. I wonder if the actual training is done on each assistant message or just on the last one.

My guess is that it’s done only on the last message, so that it would basically work like the old Completion API where the last message is completion and everything before it is a prompt, but I’m not completely sure, hence I decided to ask.

There is not really any “training” to training, the training system will try to optimise the weights such that given an input similar to the training question, it will give an answer similar to the training answer.

Now, that is vastly over simplifying what’s going on, but that’s the overview of it. So if the examples you give contain logically consistent similarities and rules, then the model will tend to follow those new rules, randomness tends to cancel out and what’s left is hopefully the distilled essentials.

As far as trying to answer your question goes, experimentation will be the best action, the reality is, this is all new, almost everything people are trying at the moment has never been done before, or if it has, it has not been documented in such a way as to be public knowledge, I’m interested to find out if there is some bias towards the end of a string (the last response) or not.

1 Like

I recognize the need for experimentation to determine best practices. Nevertheless, obtaining some technical details would be beneficial for a general understanding of what might be worth trying.

These models function by predicting tokens. My understanding is that finetuning operates in such a manner that the model learns to predict only selected tokens from a sample through some form of backpropagation. Meanwhile, the rest of the data is provided as input to facilitate these predictions, and the model is not actively trained to predict their tokens.

If only the last assistant message is actively trained, this would mean, that if you have a long conversation with for example 30 user and 30 assistant messages, it would be a valid technique to make 30 samples from it (one sample per assistant message), especially if one runs low on samples.

From the OpenAI cookbook on fine-tune chat model:

During the training process this conversation will be split, with the final entry being the completion that the model will produce, and the remainder of the messages acting as the prompt. Consider this when building your training examples - if your model will act on multi-turn conversations, then please provide representative examples so it doesn’t perform poorly when the conversation starts to expand.

(and again an example where just the prompt gives the desired results)

2 Likes

Thank you very much. This is exactly what I was looking for :+1: