Fine tuning fail on gpt-4o-mini-2024-07-18

Hi,
I try to fine tune the gpt-4o-mini-2024-07-18 model but i constantly get “The job failed due to an internal error.” without other details. What is strange is that the same file used for fine tuning gpt-4o-mini-2024-07-18 works fine on gpt3.5 turbo and gpt-4o-2024-08-06 . Is there any special requirements for gpt-4o-mini-2024-07-18 ? I use the check code from cookbook chat_finetuning_data_prep and everything seems fine in my file.
Any ideas will be appreciated.
Thank you

Having the same issue. I’m reading other threads on the topic and there doesn’t seem to be any consensus on a solution. I’m still trying to figure out if this is an error on our end or OpenAI’s. Will keep requeuing jobs until it works. Chat support not helpful.

I was thinking that is something wrong with my data but … : the same file with 450+ jsonl lines worked for fine tuning yesterday and failed today (on gpt-4o-mini-2024-07-1 ) . So i’,m starting to think it may an OpenAi problem .

Did you had any success with doing the same job multiple times ?

I split my data into subsets and fine-tuned them individually. The ones that failed had one thing in common: more than 1 example was at the 2048-item-per-example limit. So I tried to fine-tune the failing subsets at 1024 max messages per example instead of 2048, and this worked.

I am currently training my entire dataset with this in mind; the file validation takes a long time, so I will report back if it works. Perhaps there is some extra overhead or the actual limit is less than 2048.

1 Like

Update: still failing due to an internal error. I’m out of ideas

This issue has been forwarded to OpenAI.
Thanks for flagging!

3 Likes

Hi,
I have the same error, any solution?

Hello everyone (@samberk , @bence.lukacsy , @wpstream ),

Apologies for the delayed response here. This was not caused by any errors on your end (bad datasets, hyperparameters, etc), this was caused by an internal error on our end that slipped through our monitoring system and went mostly unnoticed through late December.

We believe we have pushed a fix as of 1:48pm PST today (Jan 2nd). Please try rerunning your jobs.

Happy New Year!

4 Likes

Thanks @john.allard

Looks like it’s working now.

1 Like

@john.allard @vb I still encounter the above error.
I try to finetune the gpt-4o-mini-2024-07-18 model but i constantly get “The job failed due to an internal error.” without other details.

Hi, I am currently facing the same issue, is there a solution? Has anyone else encountered this issue again?

Returned to this after a short break but am still unfortunately facing internal error (gpt-4o-mini-2024-07-18):

(Edit): I have found a workaround by limiting the number of messages per example to a lower amount like 512. Ideally the max (2048) should be used to reduce the amount of context lost, but it just doesn’t seem to work. I tried various increments all the way down to 600 and they all fail. 512 works for me, but I do lose a lot of context (since I’m cutting examples up more frequently).

1 Like