Hello,
I am encountering the following error: The job experienced an error while training and failed; it has been re-enqueued for retry.
After this error, the fine-tuning process completely fails.
I am trying to fine-tune the model with German, French, and Turkish data.
I have already trained the AI with English data, and now I am training it with only 10% of the data in each of the languages mentioned above. My goal is to ensure the model works well with these languages.
Initially, I created a single file containing data in all three languages and started fine-tuning. It failed.
Then, I broke it down into three separate files and started simultaneous fine-tuning jobs.
The fine-tuning for French and German succeeded, but Turkish failed.
I examined the Turkish data, but everything seems fine.
Because both French and German fine-tuning succeeded, I thought I could fine-tune the model trained with French data using the German data, or vice versa.
However, this attempt failed again (I tried both vice versa scenarios).
This behavior is quite puzzling, and I am unsure of what steps I should take next.
Same thing has been happening to me now for GPT-4o. Nearly every fine-tuning job I run is failing. The strange thing is I can fine-tune the exact same file with GPT-3.5-turbo and it works fine. This tells me something is going on with OpenAI.
I am having the same issue with gpt4o-mini and a .jsonl file I have used before. Is it possible that the .jsonl format has changed since the fine tuning process now has two options (supervised and direct preference optimization)? I assume that the older format is supervised which is the default. The file I submitted passed the validation stage and the training began with metrics showing, but then it failed with the internal error a few minutes later.
I’m experiencing the same issue and have tried multiple times since yesterday. Has anyone been able to resolve it or identify the source of the problem?