An error occurred while processing file 'file-name' and it cannot be used for fine-tuning. Details may be available in the file's status_details

Hey, trying to perform some fine-tuning on my data. I previously posted on the forum around the format of my data. However, I have changed this and its now in this format:
{“messages”: [{“role”: “user”, “content”: “”}, {“role”: “assistant”, “content”: “”}]}
{“messages”: [{“role”: “assistant”, “content”: “”}, {“role”: “assistant”, “content”: “”}]} with each different message on different lines. It allowed me to create the file through openai with this command: res = openai.File.create(
file=open(“”, “r”),
purpose=‘fine-tune’
). However, when going to begin the finetuning on gpt 3.5 with the finetuning job with this command line: openai.FineTuningJob.create(training_file=file_id, model=“gpt-3.5-turbo”)
I get this error: InvalidRequestError: An error occurred while processing file ‘file-59qw23we8DqtWTgo4emnkxPS’ and it cannot be used for fine-tuning. Details may be available in the file’s status_details.
Two questions: How do i do I get access to the files status_details and also why am i getting this error and how do i get rid of it. Is it to do with including a /n at the end of each line?

Hey, you need to wait until the file status turns into ‘processed’. To check the file status, you could try the following commands, depending on if you know the file id:

openai.FineTuningJob.retrieve("the_file_id")

or

openai.FineTuningJob.list()

Nothing wrong with your request, perhaps some overload with the OpenAI servers. In my case I had to wait for almost a day before my files changed to ´processed’ status.

Hey thanks for the reply, but everything is telling me my data isnt in the correct format?

looks like your “messages” array is only 1 object. try merging the user and assistant content into a single array like this.

{“messages”:[{“role”:“user”,“content”:“XXX”},{“role”:“assistant”,“content”:“YYY”}]}

2 Likes

Thanks man this has been resolved now, that was one of the issues though!

Hello, seems you succeeded with fine-tuning. My jsonl dataset passes all the checks from openAI posted checks on the website but I get this error when I submit it for fine-tuning GPT3.5 Turbo : invalid training_file
What it the formatting in your file ? mine is OK about this {“messages”:[{“role”:“user”,“content”:“XXX”},{“role”:“assistant”,“content”:“YYY”}]} BUT does not work. Could it be from jsonl formatting of the text ? (the content has things like \n and \r\n, do you have these also ?)

Did you run OpenAIs token/error checking code on it?