Is it possible to do multi-turn preference fine-tuning?

utilityfog · March 21, 2025, 10:20am

Is there a syntax that allows us to do multi-turn preference fine-tuning?

Basically, a way to combine:

[
      {"role": "user", "content": "What's the weather like today?"},
      {"role": "assistant", "content": "It's sunny and 25 degrees Celsius."},
      {"role": "user", "content": "Okay, thanks."},
      {"role": "assistant", "content": "You're welcome!"},
      {"role": "user", "content": "What's the capital of France?"},
      {"role": "assistant", "content": "Paris."},
      {"role": "user", "content": "Great, thanks."},
      {"role": "assistant", "content": "You're welcome!"}
    ]

and

{
  "input": {
    "messages": [
      {
        "role": "user",
        "content": "Hello, can you tell me how cold San Francisco is today?"
      }
    ],
    "tools": [],
    "parallel_tool_calls": true
  },
  "preferred_output": [
    {
      "role": "assistant",
      "content": "Today in San Francisco, it is not quite cold as expected. Morning clouds will give away to sunshine, with a high near 68°F (20°C) and a low around 57°F (14°C)."
    }
  ],
  "non_preferred_output": [
    {
      "role": "assistant",
      "content": "It is not particularly cold in San Francisco today."
    }
  ]
}

Topic		Replies	Views
Fine-tuning training data with multiple ideal responses API fine-tuning	1	41	August 22, 2024
Fine tuning a model with multiple parameters API assistants-api	0	66	November 22, 2024
Fine tuning model on full chats instead of a prompt response API gpt-35-turbo , fine-tuning , api	4	877	November 26, 2023
OpenAI Fine-Tuning: Multi-turn Dataset Examples API openapi , fine-tuning , gpt-3	6	8815	December 14, 2023
How does gpt-3.5-turbo fine-tuning work? API gpt-35-turbo , fine-tuning	10	1885	September 11, 2023

Is it possible to do multi-turn preference fine-tuning?

Related topics