What is the best way to get the GPT-3.5 model to learn specific data using fine-tuning? How many messages should be created for each specific data point? For example, what would the JSONL file look like for it to learn a fact like “The Superbowl champion in 2024 was the Kansas City Chiefs”?
You have to provide example conversations. You gave the same system prompt, the user put in a question organically the way a user would write - then how should the AI respond?
Note that if the answering is too focused or doesn’t have enough diversity, you may have an AI that can’t infer, or that sees the output as a pattern and can answer about the 2040 Cricket finals with hallucination.
Plus, generally facts are best obtained from a knowledge search tool which can return the exact text.
While I was typing, synthetic training was being generated…
{“messages”:[{“role”:“system”,“content”:“You are an AI sports info system for the True Fans website.”},{“role”:“user”,“content”:“Who won the Superbowl in 2024?”},{“role”:“assistant”,“content”:“The Superbowl champion in 2024 was the Kansas City Chiefs.”}]}
{“messages”:[{“role”:“system”,“content”:“You are an AI sports info system for the True Fans website.”},{“role”:“user”,“content”:“Can you tell me which team was victorious in the 2024 Superbowl?”},{“role”:“assistant”,“content”:“The Kansas City Chiefs were victorious in the 2024 Superbowl.”}]}
{“messages”:[{“role”:“system”,“content”:“You are an AI sports info system for the True Fans website.”},{“role”:“user”,“content”:“Which NFL team claimed the Superbowl title in 2024?”},{“role”:“assistant”,“content”:“The Kansas City Chiefs claimed the Superbowl title in 2024.”}]}
{“messages”:[{“role”:“system”,“content”:“You are an AI sports info system for the True Fans website.”},{“role”:“user”,“content”:“Who clinched the Superbowl trophy in 2024?”},{“role”:“assistant”,“content”:“The Kansas City Chiefs clinched the Superbowl trophy in 2024.”}]}
{“messages”:[{“role”:“system”,“content”:“You are an AI sports info system for the True Fans website.”},{“role”:“user”,“content”:“In 2024, which team won the NFL championship game?”},{“role”:“assistant”,“content”:“In 2024, the Kansas City Chiefs won the NFL championship game.”}]}
{“messages”:[{“role”:“system”,“content”:“You are an AI sports info system for the True Fans website.”},{“role”:“user”,“content”:“What team was the 2024 Superbowl champion?”},{“role”:“assistant”,“content”:“The Kansas City Chiefs were the 2024 Superbowl champions.”}]}
{“messages”:[{“role”:“system”,“content”:“You are an AI sports info system for the True Fans website.”},{“role”:“user”,“content”:“Which team topped the NFL by winning the 2024 Superbowl?”},{“role”:“assistant”,“content”:“The Kansas City Chiefs topped the NFL by winning the 2024 Superbowl.”}]}
{“messages”:[{“role”:“system”,“content”:“You are an AI sports info system for the True Fans website.”},{“role”:“user”,“content”:“Who took home the Superbowl title in 2024?”},{“role”:“assistant”,“content”:“The Kansas City Chiefs took home the Superbowl title in 2024.”}]}
{“messages”:[{“role”:“system”,“content”:“You are an AI sports info system for the True Fans website.”},{“role”:“user”,“content”:“Can you tell me the winner of the 2024 Superbowl game?”},{“role”:“assistant”,“content”:“The winner of the 2024 Superbowl game was the Kansas City Chiefs.”}]}
{“messages”:[{“role”:“system”,“content”:“You are an AI sports info system for the True Fans website.”},{“role”:“user”,“content”:“Who emerged as the champions of the Superbowl in 2024?”},{“role”:“assistant”,“content”:“The Kansas City Chiefs emerged as the champions of the Superbowl in 2024.”}]}
{“messages”:[{“role”:“system”,“content”:“You are an AI sports info system for the True Fans website.”},{“role”:“user”,“content”:“Identify the team that secured the Superbowl victory in 2024.”},{“role”:“assistant”,“content”:“The Kansas City Chiefs secured the Superbowl victory in 2024.”}]}
{“messages”:[{“role”:“system”,“content”:“You are an AI sports info system for the True Fans website.”},{“role”:“user”,“content”:“Who grabbed the 2024 Superbowl championship?”},{“role”:“assistant”,“content”:“The Kansas City Chiefs grabbed the 2024 Superbowl championship.”}]}
{“messages”:[{“role”:“system”,“content”:“You are an AI sports info system for the True Fans website.”},{“role”:“user”,“content”:“Tell me the team that dominated the Superbowl in 2024.”},{“role”:“assistant”,“content”:“The team that dominated the Superbowl in 2024 was the Kansas City Chiefs.”}]}
{“messages”:[{“role”:“system”,“content”:“You are an AI sports info system for the True Fans website.”},{“role”:“user”,“content”:“Which team finished as the Superbowl champs in 2024?”},{“role”:“assistant”,“content”:“The Kansas City Chiefs finished as the Superbowl champs in 2024.”}]}
{“messages”:[{“role”:“system”,“content”:“You are an AI sports info system for the True Fans website.”},{“role”:“user”,“content”:“Which team was the last one standing in the 2024 Superbowl?”},{“role”:“assistant”,“content”:“The Kansas City Chiefs were the last one standing in the 2024 Superbowl.”}]}
{“messages”:[{"
Do you mean RAG, for example?
What do you mean with “synthetic training”?
Thank you very much for your quick response.
Synthetic is commonly used in machine learning to refer to inputs or corpus that were not created by a natural person.
In this case, an AI was instructed by me on the training format and the goal of the examples, and produced a set of JSONL question/answer pair examples to instill answering capability by fine-tuning (that was not as diverse as one might hope).
The presentation of concrete facts for reproduction into prompting, rather than just tuned behaviors that shape the style of responses, can be automatic semantic input-based knowledge augmentation (RAG), or can be a more traditional search performed by the primary or a supervisory AI. Whatever gives the AI reliable context input about Superbowls for entailment (instead of fine-tuning showing the AI a pattern of “team name won”, where it might infer plausible answers even after giving it every game played.)