The Job Failed Due to an Internal Error | Fine-tuning gpt4o-mini

r.devkota.98 · December 4, 2024, 7:41am

I am facing an error while fine-tuning the gpt4o-mini model. I am able to fine-tune the gpt4o model with the same dataset, but the process fails with gpt4o-mini. The training stops consistently at step 98/1173. Please find the attached image for reference.

I am facing constantly this error: The job experienced an error while training and failed, it has been re-enqueued for retry.

sinmu8191 · December 18, 2024, 2:20am

Hello, I have been encountering the same problem since the 16th. Have you found a solution?

davidjosephind · December 18, 2024, 3:46pm

Multiple fine-tuning jobs are failing for me as well. This wasn’t the case before. Is it still happening for you as well? Have you found a solution?

e.kartal115 · December 18, 2024, 8:58pm

I am also in the same situation.
I’d appreciate a solution for this.

sinmu8191 · December 19, 2024, 3:03am

Each time I fine-tune, I fine-tune based on 4o-mini, rather than increasing the number of layers.

Has anyone tried this method?
Divide the training set into batches, with 100 data in each batch.
The first batch fine-tunes model A based on 4o-mini
The second batch fine-tunes model B based on model A
The third batch fine-tunes model C based on model B
…
Until all the data are fine-tuned. Get model N

The basis of this method is that fine-tuning with a small amount of data is mostly successful. Has anyone tried it?

jechearte · December 19, 2024, 12:59pm

I have the same problem. On https://status.openai.com/ it says that on December 16 there was an incident related to the fine-tuning API. It states that it was resolved, but it doesn’t seem to be true…

shahmoosaraza · December 19, 2024, 3:49pm

I am having the same issue. I ran a fine-tuning job twice, it validates the training set (dataset), and shows status: running for a couple of minutes and it also gives me an estimated finish but then I receive the Error: The job failed due to an internal error.

r.devkota.98 · December 19, 2024, 4:45pm

This issue is from OpenAI itself. They don’t have many answers other than stating they faced some downtime. You can mail their support team, and they typically fix it within 2-3 days.

grapeot · December 22, 2024, 6:39am

I encountered the exact same error as well. Interestingly, it consistently occurs at a specific iteration. I developed a few hypotheses about why it’s failing and attempted to adjust the format and content of my training data, but I haven’t been able to solve the problem. It would be great if OpenAI could step in and help us resolve this issue.

r.devkota.98 · December 22, 2024, 4:21pm

This is issue form OpenAI itself. You need to mail them if it consistently occurring along with the job id.

dwmann5 · December 22, 2024, 7:24pm

Same here with gpt4o-min and also gpt3.5turbo.

dwmann5 · December 22, 2024, 7:25pm

Which OpenAI email should we use for this type of error?

rahul.lanjewar · December 30, 2024, 6:32pm

Thanks for the suggestion. This works. I kept running into internal error failure while fine tuning 4o-mini. Those were run as a single batch. Setting the hyper-parameter on batch-size to multiple batches such that each batch was roughly 100 did the trick!

vb · January 2, 2025, 10:48pm

A bugfix has been deployed.
You can try to run your jobs again and please report in the topic below, if you should run into further issues.

vb · January 4, 2025, 10:49pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Chatgpt 4o-mini fine-tuning fails.Internal error API chatgpt	8	418	January 4, 2025
Fine tuning fail on gpt-4o-mini-2024-07-18 API fine-tuning , fine-tuning-problems	12	445	March 25, 2025
"The job experienced an error while training and failed, it has been re-enqueued for retry." API fine-tuning-problems	5	93	January 20, 2025
Fine Tuning, job failed due to an internal error API fine-tuning-problems	3	767	January 20, 2025
Internal Server Error or Network Error when trying to fine-tune a model Feedback bug , api , playground	6	331	January 22, 2025

The Job Failed Due to an Internal Error | Fine-tuning gpt4o-mini

Related topics