How does gpt-3.5-turbo fine-tuning work?

Recently, I’ve been attempting to fine-tune gpt-3.5-turbo, but I’ve found myself a bit puzzled while going through the relevant documentation. The documentation provides this format as an example:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

This is fairly straightforward. As there’s only a single pair of conversation, the assistant’s role serves as the “target value” in the training process. However, I’d like to include more context in the training set, which would involve more than 3 messages and potentially imperfect alignment, like this:

{
    "messages": [
        {"role": "system", "content": "..."}, 
        {"role": "user", "content": "..."}, 
        {"role": "assistant", "content": "..."},
        {"role": "user", "content": "..."}, 
        {"role": "assistant", "content": "..."},
        {"role": "assistant", "content": "..."},
        {"role": "user", "content": "..."}, 
    ]
}

Or this:

{
    "messages": [
        {"role": "system", "content": "..."}, 
        {"role": "assistant", "content": "..."},
        {"role": "assistant", "content": "..."},
        {"role": "assistant", "content": "..."},
        ...
    ]
}

In such cases, how will the training program handle these types of samples? I’m eager to find out.

1 Like

Only assistant roles would not normally occur in a conversation, you could try user roles with nothing in the content section, I am also interested in your findings.

1 Like

Rather, it is “how will the trained AI handle your normal chatbot conversation turns and conversation history when you’ve trained it on a mix it won’t see from your application. When you use fine-tuning to train the AI and example doesn’t show correctly how to follow conversational context…”

More intriguing is that they’ve only shown these three turns, always with a system message. What about behavior with no “system” in fine-tuning? Or what about when you want the trained behavior, but far later than after a system prompt would be seen? More things you have to pay to experiment with.

Not correct. The input to a chat API call can be a very long conversation history with both user and assistant roles as prior inputs and responses. You are training the AI how to respond to what it receives.

For example, you might tune the AI on how to summarize or rewrite what it just produced (or other skills it doesn’t actually already have) by supplying a longer conversation with some history. It can be weighted by how the response is formed from the assistant role contents along with the instruction.

How would a typical conversation with the model contain many assistant prompts one after the other? Sure you could train the model with anything, I’m simply pointing out that a “typical” training set is: user message <some text> then assistant message <some text> rinse, repeat, with the possible inclusion of a system message. Any of those messages could be blank, but not “missing”.

How would a typical conversation with the model contain many assistant prompts one after the other? By vector context management the likes of what ChatGPT uses that drops what doesn’t convey information.

Also by injection of knowledgebase data augmentation before the user question.

You seem to be making the assumption that ChatGPT uses vector storage of prior chat text, no such evidence exists, at least that I am aware of.

I have such evidence, but the nicely formatted chat history dump by role is also full of jailbreaky stuff to make it happen.

How can I fine-tune chatgpt to make AI responses more vivid or logical

You would have to give it examples of what you find vivid and logical, the more the better. They should be of in the form of Question and Answer pairs.

Can you give an example, such as how I can fine-tune multiple rounds of conversations in sales. Thank you very much

Make the customer the user roll and the sales person the assistant roll and use blocks of conversation per training element, start with introductions, then a mid game and a closure section for lots of sales examples.

Note that the AI will not learn the contents of the sales pitches or the interactions, instead it will learn how the assistant spoke, it will have the same style.