Fine Tuning with the CLI never works! Always Server Error

tylooterry · February 23, 2023, 5:53am

I have prepared the training set for fine tuning for days. But from the first time I created the training task using the API, It always output the log of "Server Error . Returning to queue for retry’ Then I have to wait another 3 hours to get the same message again and again until exceeding the maximum number of retry. I am really wondering, is this feature really avaliable to use in 2023?

tylooterry · February 23, 2023, 1:39pm

Experts, please help me with that!! Am i the only one facing this issue?

mateo.navia · February 23, 2023, 3:20pm

I have been having the same issues since Friday. It’ll keep returning to queue and retrying until it times out and fails completely. I emailed support, but have not gotten a response back.

tylooterry · February 24, 2023, 3:08am

I just tried to use curie, it works. But davinci never worked. I saw the status page said that the fine tune enqueue issuse has been resolved last friday. So I guess its a new issue? I m getting mad at it.

ruby_coder · February 24, 2023, 3:25am

Maybe use Python or Ruby to access the API and not the CLI?

HTH

tylooterry · February 24, 2023, 6:36am

I tried to use python lib to commit the fine tuning task, it still stucks at “created fine tune :xxxx” without any queue info. I just tried to fine tune the curie model several times by using the CLI, they worked. I guess the issue is only happening on davinci model.

ruby_coder · February 24, 2023, 9:00am

You should consider using the fine-tune retrieve API method to get the detailed job report based on f-t ID.

HTH

tylooterry · February 24, 2023, 1:55pm

It doesnt help. The queue info came out after 10 hours, but it still loops into the server error and retry.

ruby_coder · February 24, 2023, 2:04pm

Sorry @tylooterry

If you expect help, you should post the JSON output of the retrieve detailed job report so we can see it.

Otherwise, you are simply asking us to guess and stab at replies in thin air, making thing up without facts.

tylooterry · February 25, 2023, 2:04am

This json retrieved through the fine_tune.list API.

ruby_coder · February 25, 2023, 2:07am

Hi @tylooterry

Thanks but that is not a complete listing of the data from the retrieve API. Here is an example:

Notice there is all the “Events” in the Job info. Please post that information:

In other words, please copy-and-paste all the data from the FT retrieve API call. Providing all the job data returned from the API is much better for folks trying to help you than posting a partial screenshot.

Thanks!

tylooterry · February 25, 2023, 3:05am

{
  "created_at": 1677218545,
  "events": [
    {
      "created_at": 1677218545,
      "level": "info",
      "message": "Created fine-tune: ft-5KhuOqCd2r1QYaiFLZsyOdIC",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677252974,
      "level": "info",
      "message": "Fine-tune costs $1.94",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677252975,
      "level": "info",
      "message": "Fine-tune enqueued. Queue number: 12",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677252983,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 11",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677253282,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 10",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677253563,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 9",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677253634,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 8",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677253663,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 7",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677253844,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 6",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677253892,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 5",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677254106,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 4",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677254189,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 3",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677254308,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 2",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677254445,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 1",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677254532,
      "level": "info",
      "message": "Fine-tune started",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677257034,
      "level": "warn",
      "message": "Server error. Returning to queue for retry",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677257111,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 1",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677257116,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 0",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677257242,
      "level": "info",
      "message": "Fine-tune started",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677259733,
      "level": "warn",
      "message": "Server error. Returning to queue for retry",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677259742,
      "level": "info",
      "message": "Fine-tune started",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677262311,
      "level": "warn",
      "message": "Server error. Returning to queue for retry",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677262415,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 15",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677262431,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 14",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677262459,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 13",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677262498,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 12",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677262528,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 11",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677262667,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 10",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677262678,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 9",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677262678,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 8",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677262719,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 7",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677262839,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 6",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677262867,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 5",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677262974,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 4",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677263038,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 3",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677263131,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 2",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677263200,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 1",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677263206,
      "level": "info",
      "message": "Fine-tune is in the queue. Queue number: 0",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677263245,
      "level": "error",
      "message": "Exceeded allowed number of retries. Fine-tune failed. For help, please contact OpenAI and include your fine-tune ID: ft-5KhuOqCd2r1QYaiFLZsyOdIC",
      "object": "fine-tune-event"
    }
  ],
  "fine_tuned_model": null,
  "hyperparams": {
    "batch_size": 1,
    "learning_rate_multiplier": 0.2,
    "n_epochs": 2,
    "prompt_loss_weight": 0.01
  },
  "id": "ft-5KhuOqCd2r1QYaiFLZsyOdIC",
  "model": "davinci",
  "object": "fine-tune",
  "organization_id": "org-9YDuehY46yIterI2NCR5ArGs",
  "result_files": [],
  "status": "failed",
  "training_files": [
    {
      "bytes": 94992,
      "created_at": 1677218528,
      "filename": "file",
      "id": "file-eSSOUMrGZoFi4AFEB7Lax0Jw",
      "object": "file",
      "purpose": "fine-tune",
      "status": "processed",
      "status_details": null
    }
  ],
  "updated_at": 1677263245,
  "validation_files": []
}

I guess this is what you want.

thanks a lot!

ruby_coder · February 25, 2023, 3:31am

Yeah, and now I see what you saw before. There is a “server error”.

By best guess is that the API does not like something in your training data (your JSONL file)

The first thing I would try will probably not matter, but I would rename your training-file and add the .json file extension.

You are using the filename file and I would change that to file.jsonl.

The second thing I would try would be to not load such a large fileset until you figure this out.

Maybe reduce your JSONL file to only 5 lines, and give it a try, make sure you validate your JSONL data.

HTH

tylooterry · February 25, 2023, 3:37am

Thanks for your suggestion. But I tried the same dataset on the other three models, it worked perfectly. The file I uploaded was named regularly, and it’s with the .jsonl. I have no idea where the filename retrieved from the api came from.

BTW, Davinci is capable to handle up to 4000 tokens (prompt + completion) isnt it?

tylooterry · February 25, 2023, 3:41am

I also saw some developers from discord, said that , when they were fine tuning Davinci model, they had the same issue “Server Error”. But they just kept retrying to create new task, and it succeeded. Maybe there is an internal issue?

tylooterry · February 25, 2023, 3:43am

Forgot to say, yesterday I tried to use the dataset from the example, It’s not working either.

ruby_coder · February 25, 2023, 3:44am

Well, when I run the API to list the FT job, the training file name with extension is always correct, so it comes from your code. My code never drops any file name extension:

Example

{"object"=>"file", "id"=>"file-fu4ouhstKNgCjQcppEa6iyoq", "purpose"=>"fine-tune", "filename"=>"fine_tune_1676871006.jsonl", "bytes"=>101, "created_at"=>1676871007, "status"=>"processed", "status_details"=>nil}

Filename

"filename"=>"fine_tune_1676871006.jsonl"

It’s always good to reduce file size when there is an error and focus on getting less data working; so I’m not interested in what max_tokens may or may not be at this point in the debugging workflow.

I simply told you what I would do to debug a problem such as this, you don’t have to follow my suggestions if you don’t want.

Honestly, I have never had a “server error” returned from a FT.

Take care and best of luck. Sorry I could not help you.

tylooterry · February 25, 2023, 3:53am

I have tried the following debugging process:

I used exactly the same dataset provided from the official document’s example which is around 3 rows
I uploaded it to Davinci for three times, it failed everytime like I said.
I uploaded it to Curie for several times, the queuing is fast, and only took 40 minutes to generate the model.

Maybe can you try on your end to fine tune a model using Davinci? Caz I have never met this issue since the beginning of this week.

ruby_coder · February 25, 2023, 3:55am

OK. I will to this for you now.

Stand by @tylooterry

Status Pending

Checking FT Job Info - Still Pending

Update- Running OK, will complete FTing soon:

Succeeded

Hope this helps.

tylooterry · February 25, 2023, 4:20am

Yeah I just reuploaded the task, today is wayy faster than yesterday. Thanks for your help. Mine is worked

Topic		Replies	Views
Http 404 issue when fine-tuning API api	3	366	January 17, 2025
Stream interrupted (client disconnected) while fine tuning a GPT 3 Davinci Model API fine-tuning , davinci	3	1789	July 26, 2023
Finetune davinci and failed, API return no reason API	4	935	December 14, 2023
Can someone pls validate my training jsonl file for fine tuning? API	9	2806	March 2, 2023
Unable to submit a new davinci fine-tune job API	10	1021	February 19, 2023