Hi, while fine-tuning a dataset with “gpt-4.1-nano-2025-04-14” I encounter an internal error . The files are validated without a problem but then the failure occurs. I wonder what might be the reason. My training file has only 100 and the validation file 50 lines of data. I tried batch size=1 first, then 4 but both have failed.
Hi and welcome to the Community!
I was able to reproduce this issue.
My fine-tuning job with gpt-4.1-nano-2025-04-14 ended just as just described:
The job failed due to an internal error. after several attemtps.
Job-ID: ftjob-vVd67BHKa8hb4TQlM2nO61P4
Flagging this to the team.
Can someone help me out here? I am getting the same errors - “The job experienced an error while training and failed, it has been re-enqueued for retry.“ and sometime first an “Internal error”. I tried various acceptable jsonl formats of model training and sizes but no effect. I tried at least a dozen times. I also saw once that the OpenAI platform is currently upgrading and this cause the error. Please help I am working on a mission critical project.
I am getting same error……. still no fix ![]()
Hi everyone,
I started a supervised fine-tuning job using gpt-4o-mini-2024-07-18 with a very small JSONL dataset (only a handful of training examples).
*Can someone explain why a fine-tuning job with a very small dataset is taking this long to complete?
Thank you*
I guess I’m glad I’m not the only one!
I have been experiencing failed fine-tuning jobs since Thursday with a process that I have used successfully hundreds of times in the past. The base model is gpt-4o-mini-2024-07-18. In most cases, the job retries multiple times over the course of several hours and then finally fails with “The job failed due to an internal error.”
Over the past several days, I’ve had 13 fine-tuning jobs fail and 3 succeed.
I’ve raised an issue with OpenAI Support and have been told, “We will continue taking ownership of the underlying issue and ensure you have clear paths to keep moving forward.” but no helpful guidance or improvement.
I have this same error ![]()
+1 since last week timeline as well for finetuning gpt-4o-mini-2024-07-18 model.
What’s going on?
The status page indicates everything is peachy, but I’m still not able to fine-tune.
For reference, I am fine-tuning with base model gpt-4o-mini-2024-07-18.
Yesterday, I kept getting errors and re-queueing of jobs.
Today, my job appears to be stuck at “Files validated, moving job to queued state”.
![]()
edit: It finally started after more than 30 minutes, and proceeded to err/time out after another 30 minutes and get re-enqueued again. 3 hours later, the job ultimately failed.
This morning, fine-tuning seems to work again. ![]()
Thank you, whoever fixed the whatever!
This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.