Fine-tuning job always fails with "server_error"

Hi,

I’m trying to create a new fine-tuned model but the related fine-tuning job always fails with a generic internal server error:

{
      "object": "fine_tuning.job",
      "id": "ftjob-kCVi2kkd9oUzjDEPPdsOjEQ9",
      "model": "gpt-3.5-turbo-0613",
      "created_at": 1696501201,
      "finished_at": 1696501758,
      "fine_tuned_model": null,
      "organization_id": "org-eh5H8U9HPIBDvNM4odWTP6Me",
      "result_files": [],
      "status": "failed",
      "validation_file": null,
      "training_file": "file-ZHZB2qVxcUpbBntvCnH8VdPc",
      "hyperparameters": {
        "n_epochs": 10
      },
      "trained_tokens": null,
      "error": {
        "code": "server_error",
        "param": null,
        "message": "The job failed due to an internal error"
      }
    }

What could be the cause of this problem?

P.S.: I’m using a training file with examples containing functions (function calling). See “Fine-tuning examples > Function calling” in the API Doc: OpenAI Platform

Thanks in advance for any help!

Hi and welcome to the Developer Forum!

Can you give a few entries from your training set? and also the command line used to initiate the training please.

Hi @Foxabilo,

Here is the training file (training_samples.jsonl) that I’m using, which contains 10 examples:
https://www.nurpoint.com/training_samples.jsonl

I’m using PHP (curl) to initiate the request by calling the following endpoint (passing the uploaded training file in the “training_file” parameter):
POST https://api.openai.com/v1/fine_tuning/jobs (OpenAI Platform)

Hi, I have the same problem since an hour and a half, I validated my jsonl file with the script in the documentation and it should be ok. Maybe it could be an openai error?

It could be an issue on the OpenAI side of things, but looking at the example file in this case did not help much as it’s quite large and complex, It would be better to have a very simple file that just contains placeholders so it’s easy to view and test.

Token counts of conversations seem to be nearing the limits allowed, and you also have functions.

Without actually processing the json overhead to chat format overhead, example 2 is 4268 tokens.

I would suggest that this huge converasation only trains the AI how to respond after having received a huge conversation history. You should also have the shorter exchanges that grow to get you there.

For me the problem seems to be resolved, I was probably missing something in the file structure or API call. Thanks for the support.

@mirko.artoni How many examples does your training file contains, and how much time did it take the fine-tuning job to be completed?

I uploaded a simplified training file as suggested by @Foxabilo (just 10 examples) and it is taking FOREVER to be completed (already over 2 hours)…

Actually my file is quite small, because I am just trying to specialize the model to extract some data from a txt. I have 10 examples with a “system” message, 2 “user” messages that are always the same request, and then the txt from which I need to extract the file and last I have the “assistant” where I specify what data I want.
the structure is the following:

{
        "messages": [
            {
                "role": "system",
                "content": "ChatGPT you are a chatbot that extract data"
            },
            {
                "role": "user",
                "content": "message where is specified what data i need"
            },
            {
                "role": "user",
                "content": "another message where is specified other data i need"
            },
            {
                "role": "user",
                "content": "text"
            },
            {
                "role": "assistant",
                "content": "data"
            }
        ]
    }

after I have compose all the message I validate it in jsol format and send to the api