"The job experienced an error while training and failed, it has been re-enqueued for retry."

I am trying to fine tune the gpt-4o-mini-2024-07-18 model but I get the title error after every 99 steps and it restarts. After 3 tries I get “The job failed due to an internal error.”. Please let me know if I’m doing something wrong as my JSONL files seem to be valid. It’s been 1 day

Same error here, been scratching my head over whether it’s an OpenAI issue or something with my .jsonl file.

Has anyone been able to successulyy resolve it?

1 Like

Hi and welcome to the community!

Maybe you have affected by the most recent issues with 4o and mini?

I suggest to try again to see if it works now.

No, I’m still facing the same issue. I’ve tried about 7-8 times in the last 3 hours.

1 Like

Update: Tried it another 5-6 times. Still no fix. Would greatly appreciate if someone can guide me. I looked at OpenAI status and there’s no such outage being reported for today…

Hi, anyone able to resolve it? I am tryting to fine tune my model nland it fails every time before it reaches 100 steps.

I don’t know what to do. Can any OpenAI folks guide me please.

I can see many threads on the same issue, but no resolution

1 Like