I'm encountering an issue while uploading both training and validation files. Can you assist?

Hi, I’ve been struggling to execute fine-tuning on my training and validation data for quite some time. The console consistently throws an error related to one of the files. Both are in JSONL UTF-8 format, adhering to the specifications laid out in OpenAI’s documentation. This pertains to retraining the model.

Error: openai.error.InvalidRequestError: invalid training_file, field required

response = openai.FineTuningJob.create(
model=“ft:gpt-3.5-turbo-0613:persona”,
datasets=[
{
“file”: “file-nAwUi8alZ2DN”,
“name”: “train_data”,
“split”: “train”,
},
{
“file”: “file-YKXXYL0hfN739Ih”,
“name”: “validation_data”,
“split”: “validation”,
}
]
)

The files are correct. Where could the error be?

  1. The files seem accurate. Where might the issue lie?
  2. For the second query, does OpenAI utilize validation files to tweak the weights via the backpropagation algorithm for optimal outcomes?

Thank you

Which is not supported. You can only create a new AI model with new data.

So how do I fine-tune an existing model? Thank you.

Fine-tune an existing model is not supported. You can only create a new AI model with new data.

Can I continue fine-tuning a model that has already been fine-tuned?

No, we do not currently support continuing the fine-tuning process once a job has finished. We plan to support this in the near future.

1 Like

response = openai.FineTuningJob.create(
model=“gpt-3.5-turbo-0613”,
datasets=[
{
“file”: “file-Inw0pdUacX9Np6T6Ic”,
“name”: “train_data”,
“split”: “train”,
},
{
“file”: “file-DL33nrMHWSBqCMk”,
“name”: “validation_data”,
“split”: “validation”,
},
]
)

Same error again: openai.error.InvalidRequestError: invalid training_file, field required

You had very similar training work already?

I would look for unescaped quotes or brackets within the strings, or if manually created, any missing closures of containers.

For gpt-3.5-turbo example conversations, the only line feed permitted is between examples. Within strings, \n is required.

You can do simple validation by splitting the file by lines, and ensuring each line will pass json.loads()

Run this API request against the files endpoint,

import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.File.list()

Then find the entry with the training file you’re trying to use.

Post the relevant snippet of the response here.

Basically you want to ensure,

  1. The file you’re trying to use actually exists
  2. The file is a .jsonl file
  3. The purpose is fine-tune

If any of those three fail, then the file wasn’t properly uploaded and you’ll need to re-upload the file so you can use it for fine-tuning.

1 Like

{
“object”: “file”,
“id”: “file-DL33nrMxLPkQnHWSBqCMk”,
“purpose”: “fine-tune”,
“filename”: “file”,
“bytes”: 246848,
“created_at”: 1694697249,
“status”: “processed”,
“status_details”: null
},
{
“object”: “file”,
“id”: “file-Inw0RpDUhXpdX9Np6T6Ic”,
“purpose”: “fine-tune”,
“filename”: “file”,
“bytes”: 986774,
“created_at”: 1694697246,
“status”: “processed”,
“status_details”: null
},

Both training and validation files exist. I deleted a few letters from the share file ID, is that a problem?

I ran fine-tune on the file normally, but without the validation set. When I want to run fine-tune on the validation and training set, there is a problem.

This code works:

job = openai.FineTuningJob.create(
training_file=train_file_id,
#test_file=test_file_id,
model=“gpt-3.5-turbo-0613”

This one returns an error:

response = openai.FineTuningJob.create(
model=“gpt-3.5-turbo-0613”,
datasets=[
{
“training_file”: “file-Inw0RppdUacX9Np6T6Ic”,
“name”: “train_data”,
“split”: “train”,
},
{
“file”: “file-DL33nrMxLPkQneWSBqCMk”,
“name”: “validation_data”,
“split”: “validation”,
},
]
)

Can you train on the validation file without errors (if this is a small inexpensive set)?

Yes I can. Is the recommended practice to train on both training and validation sets?

I was unable to run fine-tune on the training and validation set. I don’t know what you mean now.

One thing that jumps out at me is this,

That should be an actual filename with a .jsonl extension (unless you edited it here for some reason).

1 Like

I did not edit the file name. The code I use to upload data.

Paths to the files you want to upload

train_file_path = “/Users/Desktop/converted_data.jsonl”
test_file_path = “/Users/Desktop/validation_data_split.jsonl”

Upload the training file to the OpenAI server

with open(train_file_path, “r”, encoding=“utf-8”) as f:
train_response = openai.File.create(
file=f,
purpose=“fine-tune”
)

Upload the test file to the OpenAI server

with open(test_file_path, “r”, encoding=“utf-8”) as f:
test_response = openai.File.create(
file=f,
purpose=“fine-tune”
)

The training file and the validation file should have the same type and quality of inputs. You should be able to scramble examples between the two files and obtain similar quality of training.

The question I posed is if you were to take your file that you use as validation, and train an AI on that as the training file (with no validation specified), if the training job would run.

Obviously, machine verification of the file contents is less expensive.

Uploading the file again with a new name is free. It may not have been received properly although it was accepted.

Yes, I will run the training and validation files, but only separately. The format is always the same, de facto it is one whole set, which I divide 80% to 20%.

The only thing I see here is you’re opening the file in r mode rather than rb.

Here is the OpenAI example code for an upload,

import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.File.create(
  file=open("mydata.jsonl", "rb"),
  purpose='fine-tune'
)

My speculation is they are expecting a bytes object on their end, so the file may be getting mangled somewhere in the pipeline.

I would at least try to re-upload the file, opening it with rb, and seeing if that has any effect.

1 Like

Thanks I’ll test this advice soon I just have to move from A to B So I’ll take a break thanks for your time and advice.

1 Like