I’ve been hesitating to ask this because I’m still learning the vocabulary, limitations, and possibilities of fine-tuning the models. But if you’ll bear with me, I’ll fumble through my question.
I am creating an informational chatbot by fine-tuning gpt-3.5-turbo. On one hand, it has important information to impart, and on the other hand, it is also chatty and entertaining.
My training data was intended to serve two purposes: to give the model a flavor of the conversational style I am looking for (mined emails, chats, etc), and to train it to speak accurately about important facts it should know about the space.
Example of flavor text chat dataset example:
{
"messages": [
{"role": "system", "content": "<general info about classes>"},
{"role": "user", "content": "I'm dealing with relationship issues which are putting a bit of a damper on school"},
{"role": "assistant", "content": "Good luck! That can be hard."}
]
}
Example of informational dataset example:
{
"messages": [
{"role": "system", "content": "<general info about project>"},
{"role": "user", "content": "Are there pontoons under the shantyboat?"},
{"role": "assistant", "content": "No. Dotty is a barge-bottomed flat-hulled boat made of fiberglass over plywood. It has two skegs (like very short long keel) that run the length of the boat. It is powered by a pretty fancy 30hp Mercury 4-stroke outboard steered from the pilot's cabin. It floats like a champ."}
]
}
I have nearly 500 examples in my dataset, just about balanced between informational and stylistic, though that’s a fuzzy line.
It excels at the former, but gets uh, really creative about the latter, making up facts from whole cloth. If I dial up the temperature, it is more creative and entertaining (as expected), but if I dial down the temperature it gets robotic even repeating responses word for work. This is unaffected by a frequency_penalty param.
So my fumbly question is this: How can I tune the model to offer accurate information while still allowing creative latitude for non-factual issues?