Did you find directional quotes and hyphens in your training data?
You can clean up and then run a continuation for another epoch or two, and if you had held-out validation, include that also. At worst, you just don’t use the new extra model.
In specialized cases, a validation file might look like you are past the point of overfitting, but the actual compliance and satisfaction is higher for that application, and you can improve generalization when the gaps in training are between two types of your own questions.
OpenAI models have already been trained on millions and millions of training questions that are rated by tons of outsourced workers and then tuned with reinforcement learning.
You do not need to fine-tune gpt-3.5-turbo to be a customer support assistant, especially on grammatically awkward chat that has little to do with the user input and uses dumb placeholders that only train the model to output placeholders.
It’s interesting what you say (“You do not need to fine-tune gpt-3.5-turbo to be a customer support assistant”). I’ve done some research to understand GPT and customer support and there different articles from experts in the field like this one in Forbes:
that point at major problems, one of them being critical “It [ChatGPT] Provides Different Answers Every Time”, and probably the author has a point, an assistant that answers different things every time is hard to use in a business environment, and my personal checks indicate that this is true often
ChatGPT - the web chatbot - provides different answers - and that’s by design. Not only is unexpected word use (instead of the most probable) seen as more human and inspired, it also allows OpenAI to gather good and bad responses to questions.
However, in the API, we can control the exact sampling parameters. The AI can say the same thing 100 times to the same input if I want to pay for it. Or I can have little Timmy’s day be different every time the AI writes about it.
However, even if the AI starts a sentence with a different word or two, if it has a plan of what it is going to write, it is going to be hard to distract it with less likely token choices on the same topic.
There’s near infinite token combinations I could have used to write this reply, and who knows why I chose the first human token “Chat”, but the overall idea was fully-formed by the input I was responding to.
You are right, it’s the changes in responses and the bad responses that are concerning for us. Customer Support is sensitive area and bad responses impact the business more than in other services like search for example. That’s the reason for the finetuning with specific data: questions and (correct) answers are linked by training.
“Bad” in terms of OpenAI getting training data with ChatGPT is getting comparative answers that are more or less satisfactory to the end user and then to knowledge workers that refine training data.
“Bad” for you might be a bot that was fine-tuned on some behaviors, but doesn’t get your company policies set in stone – would be possible if the AI could call a function to search your business manuals.
Like if this bot went along with the user and promised a refund, disregarding OpenAI policy: