OpenAI Fine-Tuning: Multi-turn Dataset Examples

I’m new to OpenAI model fine-tuning. I am fine-tuning gpt3.5-turbo for a conversational chatbot. This is part of a series of questions I’ve quickly collected as I’m learning this alchemy. A question about mutli-turn training data.

TL;DR: Yes to multiturn training data, or nay?

For example (JSONL unspooled for clarity):

    {
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant, but not too helpful."
            }, {
                "role": "user",
                "content": "Can you give me some advice on starting a new project?"
            }, {  
                "role": "assistant",
                "content": "Of course, I'd be happy to help. What's the project about?"            
            , {
                "role": "user",
                "content": "It's a mobile app for language learning."
            }, {  
                "role": "assistant",
                "content": "That sounds interesting! What specific features are you planning to include in the app?"            
            }
}
        ]
    },

I’ve read conflicting advice on this. One doc said that only single turn (user prompt and then assistant response) are necessary and desired.

One doc from OpenAI said that all of the prompts except the last one would be collapsed, serving as the prompt for the assistant response.

Another doc (perhaps something in the OpenAI workbooks? Maybe something on OpenAI forums?) said that multi-turn examples are necessary for applications that require mutli-turn conversations such as a chatbot. And went on to say that dataset examples should in fact have a length that represent the number of turns expected in a conversation. This brings up some real problematic questions.

Which is it and what is the best practice here? Bonus points if you cite your sources.

Hey I am learning too love to collab with you let me know.

I’d say “yes” – but only if you have something to add to the vast chat talent where the model already has training built-in, or if the type of fine-tune you are doing would naturally break that chat.

There’s no “prompts collapsed”. There’s only what your chat management sends to the model as far as number of turns.

One scenario I could see would be prepare the AI for answering about unusual context you might provide, like if you had session summaries, game state or knowledge injections, are omitting half of the pairings of user/AI by doing vector dB chat history or simply preferring the omission of AI context.

1 Like

Definitely okay to provide multiple messages: https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset

Each example in the dataset should be a conversation in the same format as our Chat completions API, specifically a list of messages where each message has a role, content, and optional name. At least some of the training examples should directly target cases where the prompted model is not behaving as desired, and the provided assistant messages in the data should be the ideal responses you want the model to provide.

One would presume that this part of the documentation would specifically mention a discouragement of “multi-turn” examples.

Regarding “single turn” or collapsing, you might have run into legacy tuning docs. Disregard those for 3.5-turbo. :+1:

1 Like

Thank you for both answering the question and citing sources. :clap:

In case do you need data for fine-tuning 3.5 here (huggingface/bitext) do you have a free dataset with more than 26K rows, is specific for chatbot