File format error in the training file

laela.olsen · July 15, 2025, 8:14pm

I am attempting to perform supervised fine-tuning on gpt-4.1-mini-2025-04-14 using a 7GB JSONL file containing 2.2 million samples as the training file, using these endpoints to upload the training file to OpenAI (https://platform.openai.com/docs/api-reference/uploads). The POST requests to create, add parts to, and complete the upload complete successfully with a 200 code and the 7GB file shows up in my Storage tab in the OpenAI dashboard.

10-30 seconds after fine-tuning begins, I receive the following error
The job failed due to a file format error in the training file. Invalid file format. Example 1 is not a valid JSON object.

The first line of the training file, and all of the other lines as well, follow the format below, where SYSTEM_MESSAGE, USER_PROMPT, RESPONSE, TITLE, and TAG are replacing the actual text. To ensure that the upload worked properly, I downloaded the file from OpenAI after uploading and it does match the file that was originally uploaded in parts.
{"messages": [{"role": "system", "content": "SYSTEM_MESSAGE"}, {"role": "user", "content": "USER_PROMPT"}, {"role": "assistant", "function_call": {"name": "submit_response", "arguments": "{\"response\": \"RESPONSE\", \"title\": \"TITLE\", \"tag\": \TAG\"}"}}]}

This example appears to be a valid JSON object, and crucially, testing using the same methods but with a training data file containing just the first 100 samples does not throw this error and successfully moves on from validating the file to fine-tuning, which makes me think that there isn’t actually any issue with Example 1.

Is it possible that there is some formatting error with a sample further down in the file, and the error is calling it Example 1? Is it possible that this error is being caused by unrecognized characters somewhere in the file? Finally, is it possible that the error is being caused by the size of the training data file? What would you recommend that I try next?

Please let me know if any additional information would be helpful.

laela.olsen · July 15, 2025, 8:26pm

Thanks for the suggestion!

The issue is that I get the error about Example 1 when using all 2 million samples, but when using just the first 100 samples, I don’t get the error, and the fine-tuning begins successfully.

Does the format I included above make you think it’s an issue with the JSONL format? I put the first few lines of the file into a JSON validator and they came back as valid. The training file has one JSON object per line, which is correct for the JSONL format.

Topic		Replies	Views
Invalid fine tuning training file even with a 34 character file that validates API	2	269	May 25, 2024
Unable to upload big JSONL for Vision Model Fine Tuning API gpt-4 , gpt-4-vision , gpt4o	1	73	November 28, 2025
Invalid file format- Issues with encoding different languages and emojis in Fine Tuning Community gpt-4 , fine-tuning	0	154	August 5, 2024
"Could not validate file" error API	2	976	December 14, 2023
Fine-tune Chatgpt 3.5 "The job failed due to an invalid training file" API chatgpt , api	3	2410	October 8, 2023

File format error in the training file

Related topics