Fine-tuning models: Accurate info vs. creative latitude

I’ve been hesitating to ask this because I’m still learning the vocabulary, limitations, and possibilities of fine-tuning the models. But if you’ll bear with me, I’ll fumble through my question.

I am creating an informational chatbot by fine-tuning gpt-3.5-turbo. On one hand, it has important information to impart, and on the other hand, it is also chatty and entertaining.

My training data was intended to serve two purposes: to give the model a flavor of the conversational style I am looking for (mined emails, chats, etc), and to train it to speak accurately about important facts it should know about the space.

Example of flavor text chat dataset example:

{
  "messages": [
    {"role": "system", "content": "<general info about classes>"},
    {"role": "user", "content": "I'm dealing with relationship issues which are putting a bit of a damper on school"},
    {"role": "assistant", "content": "Good luck! That can be hard."}
  ]
}

Example of informational dataset example:

{
  "messages": [
    {"role": "system", "content": "<general info about project>"},
    {"role": "user", "content": "Are there pontoons under the shantyboat?"},
    {"role": "assistant", "content": "No. Dotty is a barge-bottomed flat-hulled boat made of fiberglass over plywood. It has two skegs (like very short long keel) that run the length of the boat. It is powered by a pretty fancy 30hp Mercury 4-stroke outboard steered from the pilot's cabin. It floats like a fucking champ."}
  ]
}

I have nearly 500 examples in my dataset, just about balanced between informational and stylistic, though that’s a fuzzy line.

It excels at the former, but gets uh, really creative about the latter, making up facts from whole cloth. If I dial up the temperature, it is more creative and entertaining (as expected), but if I dial down the temperature it gets robotic even repeating responses word for work. This is unaffected by a frequency_penalty param.

So my fumbly question is this: How can I tune the model to offer accurate information while still allowing creative latitude for non-factual issues?

You would still need to incorporate RAG into your pipeline even if you have fine-tuned the model with information.

How were you thinking of triggering the retreival? I tried functions, but they are still a bit buggy with fine-tuned models.

I’m open to other suggestions as well.