Recently, I’ve been attempting to fine-tune gpt-3.5-turbo
, but I’ve found myself a bit puzzled while going through the relevant documentation. The documentation provides this format as an example:
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}
This is fairly straightforward. As there’s only a single pair of conversation, the assistant’s role serves as the “target value” in the training process. However, I’d like to include more context in the training set, which would involve more than 3 messages and potentially imperfect alignment, like this:
{
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."},
{"role": "assistant", "content": "..."},
{"role": "user", "content": "..."},
]
}
Or this:
{
"messages": [
{"role": "system", "content": "..."},
{"role": "assistant", "content": "..."},
{"role": "assistant", "content": "..."},
{"role": "assistant", "content": "..."},
...
]
}
In such cases, how will the training program handle these types of samples? I’m eager to find out.