How to make a chatbot reply with the training data completion?

Hi, I have tried to fune tune a model and this is my small training data in jsonl:

{“prompt”: “What’s your name?”, “completion”: “My name is PTO.”},
{“prompt”: “Who made you?”, “completion”: “I was made by Adabo team.”},
{“prompt”: “Where do you live?”, “completion”: “As a virtual assistand I have no physical address.”},
{“prompt” : “What’s the price of the service?”, “completion”: “It coast 10000Ariary”}

After the fune tune is completed, I tried to ask my chatbot “What’s your name?”, It dit not respond “My name is PTO” (the completion I added in the training data)
What am I doing wrong?
My use case is I want the bot to respond what I added in the funetune file.

Hello Rariny,

I had a similar experience. It appears that below a certain number of training data, the fine-tuning doesn’t give the expected results. I would recommend trying with at least 100 prompts: completion. If it still doesn’t work correctly, I suggest trying with another model (I am not sure which model you are trying to fine-tune, but I would expect some to be more suited than others for this task).
Another aspect to explore is the number of n_epochs. The smaller the dataset, the more rigid the learning should be.

Finally, I recommend you check out: openai-cookbook/examples/fine-tuned_qa at f607de50cb9ee75e68f10f13b7b870aac721e66a · openai/openai-cookbook · GitHub
=> This is a notebook by Open AI team about fine-tuning and Q&A chatbot.

Thank you for the clarification! Is it just a matter of copy-pasting the existing data or create other examples apart from the ones inside the file?

My pleasure.
I am not sure to understand your last question.

  • If you want to use the existing data they use in the example, you can download it (they provide a link in one of the notebook - cf my previous post). If I recall correctly, it’s over 4000 prompts and completions.
  • If you want to build a chatbot that answers “My name is PTO” to the question “What is your name”, you have to fine-tune it with additional ‘prompts and completitions’.

Alternatively, if it’s only a matter of 5-10 elements, you might prefer to include those in each prompt rather than creating a new model.
In other words, the prompt can be like:
“Answer the question truthfully. First, check if the answer is in the text below: \n TEXT.”
And in the TEXT you can put: “My name is PTO. I was made by the Abado team etc.”

Or “You are a chatbot made by the Abado team. Your name is PTO …”

It will be a bit more expensive (as you add this info to each prompt) but for less than 10 data points, I think it’s more efficient this way.
Hope this helps.

I have tried the one that add the part of text inside the prompt and yes it’s simple but the increases the cost as you said. I will then try to add more data in my file and see if it makes the sens. Thank you.

