I am trying to find documentation/examples of what should go in validation_file used when creating a fine tuned model.
The API docs point me to the fine-tuning-guide: OpenAI Platform
The fine-tuning guide points me to the API
Any pointers would be greatly appreciated, I am using ‘gpt-3.5-turbo’ and chat-complitions format in the training data
Thanks!
_j
2
The validation file should be the same type of example conversations as you are training on, in the same type of file format.
The held-out examples should be of the quality where you could shuffle all your questions randomly and put any 10% of them into a validation file.
The validation file lets you see a second benchmark produced during fine-tune: not just how much the learning on the training set has progressed, but how well similar questions are inferred.
There can be a point of over-training or over-specialization where the AI no longer works as well on those similar questions it has not seen before, by being fine tuned to write only what you gave it.
1 Like