Fine tuning stuck after a few steps

production4 · November 7, 2024, 4:43am

I started a fine tuning job for gpt-4o-mini an hour ago on a dataset of around 3500 samples with a batch size of 9. It was going smoothly as it got queued and started within a couple of minutes.

My model reached the 304th step within a few minutes and then just stopped. It has been like this for about an hour now and it is not moving forward at all.

It has not updated any status or reported any problems, just saying fine-tuning but the metrics aren’t updating.

Is this normal? When is it supposed to start training again? Are there any checkpoints saved (I assume it has not completed 1 complete epoch yet as my dataset samples / batch size would be around 380 steps).

Topic		Replies	Views
FineTuning stuck on validating Files for 12 hours API fine-tuning-problems	0	779	January 16, 2024
Fine-tuning gpt3.5 stuck in validating_files for 6 hours API fine-tuning-problems	1	273	July 5, 2024
Finetuning "Pending" for 8+ hours API	17	4808	May 17, 2022
Stuck at 'Created fine-tune: ft-X' API fine-tuning	13	1806	December 18, 2023
Fine tune job not run even after 15 hours API fine-tuning	14	2351	August 10, 2023

Fine tuning stuck after a few steps

Related topics