I have prepared the training set for fine tuning for days. But from the first time I created the training task using the API, It always output the log of "Server Error . Returning to queue for retry’ Then I have to wait another 3 hours to get the same message again and again until exceeding the maximum number of retry. I am really wondering, is this feature really avaliable to use in 2023?
Experts, please help me with that!! Am i the only one facing this issue?
I have been having the same issues since Friday. It’ll keep returning to queue and retrying until it times out and fails completely. I emailed support, but have not gotten a response back.
I just tried to use curie, it works. But davinci never worked. I saw the status page said that the fine tune enqueue issuse has been resolved last friday. So I guess its a new issue? I m getting mad at it.
Maybe use Python or Ruby to access the API and not the CLI?
HTH
I tried to use python lib to commit the fine tuning task, it still stucks at “created fine tune :xxxx” without any queue info. I just tried to fine tune the curie model several times by using the CLI, they worked. I guess the issue is only happening on davinci model.
You should consider using the fine-tune retrieve API method to get the detailed job report based on f-t ID.
HTH
It doesnt help. The queue info came out after 10 hours, but it still loops into the server error and retry.
Sorry @tylooterry
If you expect help, you should post the JSON output of the retrieve detailed job report so we can see it.
Otherwise, you are simply asking us to guess and stab at replies in thin air, making thing up without facts.
Hi @tylooterry
Thanks but that is not a complete listing of the data from the retrieve API. Here is an example:
Notice there is all the “Events” in the Job info. Please post that information:
In other words, please copy-and-paste all the data from the FT retrieve API call. Providing all the job data returned from the API is much better for folks trying to help you than posting a partial screenshot.
Thanks!
{
"created_at": 1677218545,
"events": [
{
"created_at": 1677218545,
"level": "info",
"message": "Created fine-tune: ft-5KhuOqCd2r1QYaiFLZsyOdIC",
"object": "fine-tune-event"
},
{
"created_at": 1677252974,
"level": "info",
"message": "Fine-tune costs $1.94",
"object": "fine-tune-event"
},
{
"created_at": 1677252975,
"level": "info",
"message": "Fine-tune enqueued. Queue number: 12",
"object": "fine-tune-event"
},
{
"created_at": 1677252983,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 11",
"object": "fine-tune-event"
},
{
"created_at": 1677253282,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 10",
"object": "fine-tune-event"
},
{
"created_at": 1677253563,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 9",
"object": "fine-tune-event"
},
{
"created_at": 1677253634,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 8",
"object": "fine-tune-event"
},
{
"created_at": 1677253663,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 7",
"object": "fine-tune-event"
},
{
"created_at": 1677253844,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 6",
"object": "fine-tune-event"
},
{
"created_at": 1677253892,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 5",
"object": "fine-tune-event"
},
{
"created_at": 1677254106,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 4",
"object": "fine-tune-event"
},
{
"created_at": 1677254189,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 3",
"object": "fine-tune-event"
},
{
"created_at": 1677254308,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 2",
"object": "fine-tune-event"
},
{
"created_at": 1677254445,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 1",
"object": "fine-tune-event"
},
{
"created_at": 1677254532,
"level": "info",
"message": "Fine-tune started",
"object": "fine-tune-event"
},
{
"created_at": 1677257034,
"level": "warn",
"message": "Server error. Returning to queue for retry",
"object": "fine-tune-event"
},
{
"created_at": 1677257111,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 1",
"object": "fine-tune-event"
},
{
"created_at": 1677257116,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 0",
"object": "fine-tune-event"
},
{
"created_at": 1677257242,
"level": "info",
"message": "Fine-tune started",
"object": "fine-tune-event"
},
{
"created_at": 1677259733,
"level": "warn",
"message": "Server error. Returning to queue for retry",
"object": "fine-tune-event"
},
{
"created_at": 1677259742,
"level": "info",
"message": "Fine-tune started",
"object": "fine-tune-event"
},
{
"created_at": 1677262311,
"level": "warn",
"message": "Server error. Returning to queue for retry",
"object": "fine-tune-event"
},
{
"created_at": 1677262415,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 15",
"object": "fine-tune-event"
},
{
"created_at": 1677262431,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 14",
"object": "fine-tune-event"
},
{
"created_at": 1677262459,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 13",
"object": "fine-tune-event"
},
{
"created_at": 1677262498,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 12",
"object": "fine-tune-event"
},
{
"created_at": 1677262528,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 11",
"object": "fine-tune-event"
},
{
"created_at": 1677262667,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 10",
"object": "fine-tune-event"
},
{
"created_at": 1677262678,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 9",
"object": "fine-tune-event"
},
{
"created_at": 1677262678,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 8",
"object": "fine-tune-event"
},
{
"created_at": 1677262719,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 7",
"object": "fine-tune-event"
},
{
"created_at": 1677262839,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 6",
"object": "fine-tune-event"
},
{
"created_at": 1677262867,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 5",
"object": "fine-tune-event"
},
{
"created_at": 1677262974,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 4",
"object": "fine-tune-event"
},
{
"created_at": 1677263038,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 3",
"object": "fine-tune-event"
},
{
"created_at": 1677263131,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 2",
"object": "fine-tune-event"
},
{
"created_at": 1677263200,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 1",
"object": "fine-tune-event"
},
{
"created_at": 1677263206,
"level": "info",
"message": "Fine-tune is in the queue. Queue number: 0",
"object": "fine-tune-event"
},
{
"created_at": 1677263245,
"level": "error",
"message": "Exceeded allowed number of retries. Fine-tune failed. For help, please contact OpenAI and include your fine-tune ID: ft-5KhuOqCd2r1QYaiFLZsyOdIC",
"object": "fine-tune-event"
}
],
"fine_tuned_model": null,
"hyperparams": {
"batch_size": 1,
"learning_rate_multiplier": 0.2,
"n_epochs": 2,
"prompt_loss_weight": 0.01
},
"id": "ft-5KhuOqCd2r1QYaiFLZsyOdIC",
"model": "davinci",
"object": "fine-tune",
"organization_id": "org-9YDuehY46yIterI2NCR5ArGs",
"result_files": [],
"status": "failed",
"training_files": [
{
"bytes": 94992,
"created_at": 1677218528,
"filename": "file",
"id": "file-eSSOUMrGZoFi4AFEB7Lax0Jw",
"object": "file",
"purpose": "fine-tune",
"status": "processed",
"status_details": null
}
],
"updated_at": 1677263245,
"validation_files": []
}
I guess this is what you want.
thanks a lot!
Yeah, and now I see what you saw before. There is a “server error”.
By best guess is that the API does not like something in your training data (your JSONL file)
The first thing I would try will probably not matter, but I would rename your training-file and add the .json
file extension.
You are using the filename file
and I would change that to file.jsonl
.
The second thing I would try would be to not load such a large fileset until you figure this out.
Maybe reduce your JSONL file to only 5 lines, and give it a try, make sure you validate your JSONL data.
HTH
Thanks for your suggestion. But I tried the same dataset on the other three models, it worked perfectly. The file I uploaded was named regularly, and it’s with the .jsonl. I have no idea where the filename retrieved from the api came from.
BTW, Davinci is capable to handle up to 4000 tokens (prompt + completion) isnt it?
I also saw some developers from discord, said that , when they were fine tuning Davinci model, they had the same issue “Server Error”. But they just kept retrying to create new task, and it succeeded. Maybe there is an internal issue?
Forgot to say, yesterday I tried to use the dataset from the example, It’s not working either.
Well, when I run the API to list the FT job, the training file name with extension is always correct, so it comes from your code. My code never drops any file name extension:
Example
{"object"=>"file", "id"=>"file-fu4ouhstKNgCjQcppEa6iyoq", "purpose"=>"fine-tune", "filename"=>"fine_tune_1676871006.jsonl", "bytes"=>101, "created_at"=>1676871007, "status"=>"processed", "status_details"=>nil}
Filename
"filename"=>"fine_tune_1676871006.jsonl"
It’s always good to reduce file size when there is an error and focus on getting less data working; so I’m not interested in what max_tokens
may or may not be at this point in the debugging workflow.
I simply told you what I would do to debug a problem such as this, you don’t have to follow my suggestions if you don’t want.
Honestly, I have never had a “server error” returned from a FT.
Take care and best of luck. Sorry I could not help you.
I have tried the following debugging process:
- I used exactly the same dataset provided from the official document’s example which is around 3 rows
- I uploaded it to Davinci for three times, it failed everytime like I said.
- I uploaded it to Curie for several times, the queuing is fast, and only took 40 minutes to generate the model.
Maybe can you try on your end to fine tune a model using Davinci? Caz I have never met this issue since the beginning of this week.
OK. I will to this for you now.
Stand by @tylooterry
Status Pending
Checking FT Job Info - Still Pending
Update- Running OK, will complete FTing soon:
Succeeded
Hope this helps.
Yeah I just reuploaded the task, today is wayy faster than yesterday. Thanks for your help. Mine is worked