I'm encountering an issue while uploading both training and validation files. Can you assist?

lukas.1995 · September 14, 2023, 11:54am

Hi, I’ve been struggling to execute fine-tuning on my training and validation data for quite some time. The console consistently throws an error related to one of the files. Both are in JSONL UTF-8 format, adhering to the specifications laid out in OpenAI’s documentation. This pertains to retraining the model.

Error: openai.error.InvalidRequestError: invalid training_file, field required

response = openai.FineTuningJob.create(
model=“ft:gpt-3.5-turbo-0613:persona”,
datasets=[
{
“file”: “file-nAwUi8alZ2DN”,
“name”: “train_data”,
“split”: “train”,
},
{
“file”: “file-YKXXYL0hfN739Ih”,
“name”: “validation_data”,
“split”: “validation”,
}
]
)

The files are correct. Where could the error be?

The files seem accurate. Where might the issue lie?
For the second query, does OpenAI utilize validation files to tweak the weights via the backpropagation algorithm for optimal outcomes?

Thank you

_j · September 14, 2023, 11:58am

Which is not supported. You can only create a new AI model with new data.

lukas.1995 · September 14, 2023, 12:16pm

So how do I fine-tune an existing model? Thank you.

_j · September 14, 2023, 12:34pm

Fine-tune an existing model is not supported. You can only create a new AI model with new data.

Can I continue fine-tuning a model that has already been fine-tuned?

No, we do not currently support continuing the fine-tuning process once a job has finished. We plan to support this in the near future.

lukas.1995 · September 14, 2023, 1:17pm

response = openai.FineTuningJob.create(
model=“gpt-3.5-turbo-0613”,
datasets=[
{
“file”: “file-Inw0pdUacX9Np6T6Ic”,
“name”: “train_data”,
“split”: “train”,
},
{
“file”: “file-DL33nrMHWSBqCMk”,
“name”: “validation_data”,
“split”: “validation”,
},
]
)

Same error again: openai.error.InvalidRequestError: invalid training_file, field required

_j · September 14, 2023, 1:33pm

You had very similar training work already?

I would look for unescaped quotes or brackets within the strings, or if manually created, any missing closures of containers.

For gpt-3.5-turbo example conversations, the only line feed permitted is between examples. Within strings, \n is required.

You can do simple validation by splitting the file by lines, and ensuring each line will pass json.loads()

anon22939549 · September 14, 2023, 1:35pm

Run this API request against the files endpoint,

import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.File.list()

Then find the entry with the training file you’re trying to use.

Post the relevant snippet of the response here.

Basically you want to ensure,

The file you’re trying to use actually exists
The file is a .jsonl file
The purpose is fine-tune

If any of those three fail, then the file wasn’t properly uploaded and you’ll need to re-upload the file so you can use it for fine-tuning.

lukas.1995 · September 14, 2023, 1:50pm

{
“object”: “file”,
“id”: “file-DL33nrMxLPkQnHWSBqCMk”,
“purpose”: “fine-tune”,
“filename”: “file”,
“bytes”: 246848,
“created_at”: 1694697249,
“status”: “processed”,
“status_details”: null
},
{
“object”: “file”,
“id”: “file-Inw0RpDUhXpdX9Np6T6Ic”,
“purpose”: “fine-tune”,
“filename”: “file”,
“bytes”: 986774,
“created_at”: 1694697246,
“status”: “processed”,
“status_details”: null
},

Both training and validation files exist. I deleted a few letters from the share file ID, is that a problem?

lukas.1995 · September 14, 2023, 2:00pm

I ran fine-tune on the file normally, but without the validation set. When I want to run fine-tune on the validation and training set, there is a problem.

This code works:

job = openai.FineTuningJob.create(
training_file=train_file_id,
#test_file=test_file_id,
model=“gpt-3.5-turbo-0613”

This one returns an error:

response = openai.FineTuningJob.create(
model=“gpt-3.5-turbo-0613”,
datasets=[
{
“training_file”: “file-Inw0RppdUacX9Np6T6Ic”,
“name”: “train_data”,
“split”: “train”,
},
{
“file”: “file-DL33nrMxLPkQneWSBqCMk”,
“name”: “validation_data”,
“split”: “validation”,
},
]
)

_j · September 14, 2023, 2:05pm

Can you train on the validation file without errors (if this is a small inexpensive set)?

lukas.1995 · September 14, 2023, 2:11pm

Yes I can. Is the recommended practice to train on both training and validation sets?

lukas.1995 · September 14, 2023, 2:13pm

I was unable to run fine-tune on the training and validation set. I don’t know what you mean now.

anon22939549 · September 14, 2023, 2:15pm

One thing that jumps out at me is this,

That should be an actual filename with a .jsonl extension (unless you edited it here for some reason).

lukas.1995 · September 14, 2023, 2:17pm

I did not edit the file name. The code I use to upload data.

Paths to the files you want to upload

train_file_path = “/Users/Desktop/converted_data.jsonl”
test_file_path = “/Users/Desktop/validation_data_split.jsonl”

Upload the training file to the OpenAI server

with open(train_file_path, “r”, encoding=“utf-8”) as f:
train_response = openai.File.create(
file=f,
purpose=“fine-tune”
)

Upload the test file to the OpenAI server

with open(test_file_path, “r”, encoding=“utf-8”) as f:
test_response = openai.File.create(
file=f,
purpose=“fine-tune”
)

_j · September 14, 2023, 2:18pm

The training file and the validation file should have the same type and quality of inputs. You should be able to scramble examples between the two files and obtain similar quality of training.

The question I posed is if you were to take your file that you use as validation, and train an AI on that as the training file (with no validation specified), if the training job would run.

Obviously, machine verification of the file contents is less expensive.

Uploading the file again with a new name is free. It may not have been received properly although it was accepted.

lukas.1995 · September 14, 2023, 2:21pm

Yes, I will run the training and validation files, but only separately. The format is always the same, de facto it is one whole set, which I divide 80% to 20%.

anon22939549 · September 14, 2023, 2:26pm

The only thing I see here is you’re opening the file in r mode rather than rb.

Here is the OpenAI example code for an upload,

import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.File.create(
  file=open("mydata.jsonl", "rb"),
  purpose='fine-tune'
)

My speculation is they are expecting a bytes object on their end, so the file may be getting mangled somewhere in the pipeline.

I would at least try to re-upload the file, opening it with rb, and seeing if that has any effect.

lukas.1995 · September 14, 2023, 2:29pm

Thanks I’ll test this advice soon I just have to move from A to B So I’ll take a break thanks for your time and advice.

Topic		Replies	Views
File format error in the training file API fine-tuning	3	52	July 15, 2025
I get Error 400 (Bad request) on creating fine-tuned model API fine-tuning	11	1516	October 24, 2024
An error occurred while processing file 'file-name' and it cannot be used for fine-tuning. Details may be available in the file's status_details API fine-tuning , fine-tuning-problems	6	1942	September 18, 2023
Unable to submit a new davinci fine-tune job API	10	1026	February 19, 2023
Invalid fine tuning training file even with a 34 character file that validates API	2	225	May 25, 2024

I'm encountering an issue while uploading both training and validation files. Can you assist?

Can I continue fine-tuning a model that has already been fine-tuned?

Paths to the files you want to upload

Upload the training file to the OpenAI server

Upload the test file to the OpenAI server

Related topics