I am trying to find documentation/examples of what should go in validation_file used when creating a fine tuned model.
The API docs point me to the fine-tuning-guide: OpenAI Platform
The fine-tuning guide points me to the API
Any pointers would be greatly appreciated, I am using ‘gpt-3.5-turbo’ and chat-complitions format in the training data
The validation file should be the same type of example conversations as you are training on, in the same type of file format.
The held-out examples should be of the quality where you could shuffle all your questions randomly and put any 10% of them into a validation file.
The validation file lets you see a second benchmark produced during fine-tune: not just how much the learning on the training set has progressed, but how well similar questions are inferred.
There can be a point of over-training or over-specialization where the AI no longer works as well on those similar questions it has not seen before, by being fine tuned to write only what you gave it.