Internal server error in fine tuning

Hi while I am working on fine-tuning a model
gpt-3.5-turbo-1106 — but for the last 2 days it’s just giving internal server errors again and again.

I have checked everything - but was not able to find anything - the file is perfect - and there were no errors while updating it.

Please help, feel stuck.

You aren’t the only person to have “500” errors trying to tune recently, so it might be something that would have worked before but indeed is acting broken now.

I would try this against the standard gpt-3.5-turbo model (gpt-3.5-turbo-0613).

The preview AI model 1106 will likely have a new version coming out some time soon due to errors reported with its function-call abilities in different languages, giving tunes on the model an uncertain fate. “Recommended” should be changed to “not recommended”.

Also, there have been errors with particular characters, such as accented “e” in training files, but those seemed to be initially caught by a file validation error.

Thank you for your reply, but my whole training data needs 16k context size to work

also, I tried training the model on a file that I had already trained it on - then also got the same error - so mostly the error recently came into existence after some changes from Openai’s side

what do you think based on your experience will it be solved soon? As I am stuck right now

also, I am curios about why there is no mention of this on Openai status or any other place. any idea – because of all this I was feeling that I am the only one facing this

Were it not for my list of fine-tunes already having many “testing” tunes that are undelete-able clutter, I’d be glad to do a “find out if it is just you” tune.

Perhaps note for investigation by those from OpenAI with actual power to fix, and note for your own investigation of what you might explore to make a new training file go:

  • How many training examples?
  • Have you tuned before on similar?
  • Are you using functions within the fine-tune? Do they independently validate?
  • Are you using Unicode or characters outside of ASCII 128?
  • Are you using all three roles in sequence system, user, assistant? Are you extending that conversation example?
  • Attempting to train on more than 4k tokens per example in total?
1 Like
  • How many training examples?
    around 115 examples

  • Have you tuned before on similar?
    yes, and it worked fine

  • Are you using functions within the fine-tune? Do they independently validate?
    yes, a lot of them. Previously I had trained on the functions

  • Are you using Unicode or characters outside of ASCII 128?
    yes, a few examples had different languages. But now that you had mentioned that Open AI was facing some issues with function calling in different languages - 5 minutes back I just removed all those examples and again started a new job with only ASCII 128 characters.

  • Are you using all three roles in sequence system, user, assistant? Are you extending that conversation example?
    Yes, function is also a role.

  • Attempting to train on more than 4k tokens per example in total?
    yes - a lot of them are above 4k – max it goes to 12k

1 Like

I have finetuned using GPT 3.5 turbo multiple times over the past ~8 days with training data of varying sizes ranging from several hundreds up to several thousands of examples. no issues on my end.

Sorry to hear that you facing these troubles!

1 Like

Hi @jr.2509, thank you for participating in the conversations. Just curious about the following things.

  1. Were you using gpt-3.5-turbo-1106 as the base model?
  2. Do you add any examples with content around 12k?
  3. Were there any non-ASCII 128 characters in your training data?
  1. Yes
  2. No, mine were of shorter size
  3. On a very, very limited basis

oh, got it. @jr.2509

Sorry, but I have two more questions

  1. when was the last time you trained it? I have been facing the issue for the last 2 days
  2. and did you have function calls in the training data?

also, next time when you try again then it’ll be really helpful if you can let me know if it works for you or not - thank you.

no worries @yashukla :slight_smile: Last time was this morning but it was just a very small file for a quick experiment. I have not so far included function calls in my training data.

I’ll be doing one more round of training of a larger file (3000+ training examples) tomorrow at some point. Will let you know then if it works.

have tried with smaller size examples just for the sake of it? do you have enough credit on your account?

tried all permutations and combinations after you asked the question – @jr.2509

then divided the main data into small batches and saw which were failing - and then analyzed the data of the failed ones - and then, at last, luckily - got the issue –

it was happening because of some structure error in the arguments of a function call
Open AI is not putting any validation check for these as of now and I think suddenly fine-tuning started expecting only perfect function structures - and that’s what was causing the issues.

also, I removed all the ASCII characters.

1 Like

Super glad to hear that all of your analysis paid off and you finally got it working again. Definitely some good learnings and insights from your case!