Internal server error in fine tuning

yashukla · January 21, 2024, 11:04am

Hi while I am working on fine-tuning a model
gpt-3.5-turbo-1106 — but for the last 2 days it’s just giving internal server errors again and again.

I have checked everything - but was not able to find anything - the file is perfect - and there were no errors while updating it.

Please help, feel stuck.

_j · January 21, 2024, 11:30am

You aren’t the only person to have “500” errors trying to tune recently, so it might be something that would have worked before but indeed is acting broken now.

I would try this against the standard gpt-3.5-turbo model (gpt-3.5-turbo-0613).

The preview AI model 1106 will likely have a new version coming out some time soon due to errors reported with its function-call abilities in different languages, giving tunes on the model an uncertain fate. “Recommended” should be changed to “not recommended”.

Also, there have been errors with particular characters, such as accented “e” in training files, but those seemed to be initially caught by a file validation error.

yashukla · January 21, 2024, 11:48am

Thank you for your reply, but my whole training data needs 16k context size to work

also, I tried training the model on a file that I had already trained it on - then also got the same error - so mostly the error recently came into existence after some changes from Openai’s side

what do you think based on your experience will it be solved soon? As I am stuck right now

also, I am curios about why there is no mention of this on Openai status or any other place. any idea – because of all this I was feeling that I am the only one facing this

_j · January 21, 2024, 12:06pm

Were it not for my list of fine-tunes already having many “testing” tunes that are undelete-able clutter, I’d be glad to do a “find out if it is just you” tune.

Perhaps note for investigation by those from OpenAI with actual power to fix, and note for your own investigation of what you might explore to make a new training file go:

How many training examples?
Have you tuned before on similar?
Are you using functions within the fine-tune? Do they independently validate?
Are you using Unicode or characters outside of ASCII 128?
Are you using all three roles in sequence system, user, assistant? Are you extending that conversation example?
Attempting to train on more than 4k tokens per example in total?

yashukla · January 21, 2024, 12:20pm

How many training examples?
around 115 examples
Have you tuned before on similar?
yes, and it worked fine
Are you using functions within the fine-tune? Do they independently validate?
yes, a lot of them. Previously I had trained on the functions
Are you using Unicode or characters outside of ASCII 128?
yes, a few examples had different languages. But now that you had mentioned that Open AI was facing some issues with function calling in different languages - 5 minutes back I just removed all those examples and again started a new job with only ASCII 128 characters.
Are you using all three roles in sequence system, user, assistant? Are you extending that conversation example?
Yes, function is also a role.
Attempting to train on more than 4k tokens per example in total?
yes - a lot of them are above 4k – max it goes to 12k

jr.2509 · January 21, 2024, 12:47pm

I have finetuned using GPT 3.5 turbo multiple times over the past ~8 days with training data of varying sizes ranging from several hundreds up to several thousands of examples. no issues on my end.

Sorry to hear that you facing these troubles!

yashukla · January 21, 2024, 12:53pm

Hi @jr.2509, thank you for participating in the conversations. Just curious about the following things.

Were you using gpt-3.5-turbo-1106 as the base model?
Do you add any examples with content around 12k?
Were there any non-ASCII 128 characters in your training data?

jr.2509 · January 21, 2024, 1:00pm

Yes
No, mine were of shorter size
On a very, very limited basis

yashukla · January 21, 2024, 1:23pm

oh, got it. @jr.2509

Sorry, but I have two more questions

when was the last time you trained it? I have been facing the issue for the last 2 days
and did you have function calls in the training data?

also, next time when you try again then it’ll be really helpful if you can let me know if it works for you or not - thank you.

jr.2509 · January 21, 2024, 1:36pm

no worries @yashukla Last time was this morning but it was just a very small file for a quick experiment. I have not so far included function calls in my training data.

I’ll be doing one more round of training of a larger file (3000+ training examples) tomorrow at some point. Will let you know then if it works.

jr.2509 · January 21, 2024, 1:40pm

have tried with smaller size examples just for the sake of it? do you have enough credit on your account?

yashukla · January 22, 2024, 3:03am

tried all permutations and combinations after you asked the question – @jr.2509

then divided the main data into small batches and saw which were failing - and then analyzed the data of the failed ones - and then, at last, luckily - got the issue –

it was happening because of some structure error in the arguments of a function call
Open AI is not putting any validation check for these as of now and I think suddenly fine-tuning started expecting only perfect function structures - and that’s what was causing the issues.

also, I removed all the ASCII characters.

jr.2509 · January 22, 2024, 3:16am

Super glad to hear that all of your analysis paid off and you finally got it working again. Definitely some good learnings and insights from your case!

Topic		Replies	Views
API Error code: 500 - fine tuned model Bugs	8	984	September 17, 2024
Finetuning on Platform internal server errors Bugs api , fine-tuning-problems	4	284	June 12, 2024
Fine Tuning, job failed due to an internal error API fine-tuning-problems	3	778	January 20, 2025
500 Internal Server Error when fine tuning a fine tuned model Bugs fine-tuning-problems	8	184	November 19, 2024
Internal Server Error or Network Error when trying to fine-tune a model Feedback bug , api , playground	6	359	January 22, 2025

Internal server error in fine tuning

Related topics