Chatgpt 4o-mini fine-tuning fails.Internal error

Hello,
I am encountering the following error:
The job experienced an error while training and failed; it has been re-enqueued for retry.

After this error, the fine-tuning process completely fails.

I am trying to fine-tune the model with German, French, and Turkish data.

  • I have already trained the AI with English data, and now I am training it with only 10% of the data in each of the languages mentioned above. My goal is to ensure the model works well with these languages.
  • Initially, I created a single file containing data in all three languages and started fine-tuning. It failed.
  • Then, I broke it down into three separate files and started simultaneous fine-tuning jobs.
    • The fine-tuning for French and German succeeded, but Turkish failed.
  • I examined the Turkish data, but everything seems fine.

Because both French and German fine-tuning succeeded, I thought I could fine-tune the model trained with French data using the German data, or vice versa.

  • However, this attempt failed again (I tried both vice versa scenarios).

This behavior is quite puzzling, and I am unsure of what steps I should take next.

Fine-Tune jobs are failing a lot for me as well. I think it is an issue on OpenAI’s end.

This was not an issue I have faced before, but now most fine-tuning jobs are failing for me as well with the same error message.

I have the same problem, I can only try once every 2-3 hours

Same thing has been happening to me now for GPT-4o. Nearly every fine-tuning job I run is failing. The strange thing is I can fine-tune the exact same file with GPT-3.5-turbo and it works fine. This tells me something is going on with OpenAI.

I am having the same issue with gpt4o-mini and a .jsonl file I have used before. Is it possible that the .jsonl format has changed since the fine tuning process now has two options (supervised and direct preference optimization)? I assume that the older format is supervised which is the default. The file I submitted passed the validation stage and the training began with metrics showing, but then it failed with the internal error a few minutes later.

Update - my gpt-3.5-turbo job is failing now too.

Update - the gpt4o-mini job finally went thru (!) after many retries and hours. Obviously an internal OpenAI issue.

I’m experiencing the same issue and have tried multiple times since yesterday. Has anyone been able to resolve it or identify the source of the problem?

A fix has been deployed.
You can try to run your jobs again and please report in the topic below, if you should run into further issues.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.