Where can I find information about validation_file used in fine tuning?

The validation file should be the same type of example conversations as you are training on, in the same type of file format.

The held-out examples should be of the quality where you could shuffle all your questions randomly and put any 10% of them into a validation file.

The validation file lets you see a second benchmark produced during fine-tune: not just how much the learning on the training set has progressed, but how well similar questions are inferred.

There can be a point of over-training or over-specialization where the AI no longer works as well on those similar questions it has not seen before, by being fine tuned to write only what you gave it.

1 Like