I’m working on a training data set and trying to ensure its formatted in the right way but something doesn’t add up.
In the docs it shows the format with the prompt value ending with: Agent:
{"prompt":"Summary: <summary of the interaction so far>\n\nSpecific information:<for example order details in natural language>\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer: <message2>\nAgent:", "completion":" <response2>\n"}
But that is not a unique suffix separator so i get this warning in the python CLI
- All prompts end with suffix `\n\nAgent:`
WARNING: Some of your prompts contain the suffix `
Agent:` more than once. We strongly suggest that you review your prompts and add a unique suffix
But that is using the structure they recommend in the docs for ChatBot?
You should not use fine tuning for a chatbot. Use GPT-3.5-turbo, the models that you can fine tune will not work well, they have no instruction following nor conversational data.
yes you’re right, currently we only can fine-tune the base model like ada, babbage, curie, or davinci but not the latest model ones like text-*, GPT-3.5, or GPT-4.
If you do work in Chatbot or any QnA conversation you can combine embedding + text/chat(choose one) completion model. I think the nice simple tutorial to begin with by reading and doing some experiments like in this tutorials.
Then after you’re getting used with how this combined models work, you can read through this discussions to have an idea how embedding takes a role in chat completion model.