How is the fine-tuned model picked?

I was fine tuning a gpt-3.5-turbo-1106 with my own dataset which has about 800 training samples and about 150 validation samples for 3 epochs.

From the image attached, the checkpoint at step 2201 seems to have lower training and validation loss. How does openai pick the checkpoint when i’m using it for inference ? Does it by default pick the checkpoint with lowest training/validation loss or is it the one at the last step ?

I couldn’t find anything in the documentation either so some transparency would be nice.


You get the end results only.

You can use the resultant curves to infer some notion of quality, inference, and overfitting.

The training and validation loss may be perturbed by the particular state of progress through the learning when an evaluation is performed. The learning statistics and internal steps and batches aren’t said to correspond to divisions between wholly-formed examples.

