I started a fine tuning job for gpt-4o-mini an hour ago on a dataset of around 3500 samples with a batch size of 9. It was going smoothly as it got queued and started within a couple of minutes.
My model reached the 304th step within a few minutes and then just stopped. It has been like this for about an hour now and it is not moving forward at all.
It has not updated any status or reported any problems, just saying fine-tuning but the metrics aren’t updating.
Is this normal? When is it supposed to start training again? Are there any checkpoints saved (I assume it has not completed 1 complete epoch yet as my dataset samples / batch size would be around 380 steps).