Vision finetuning - "job failed due to a file format error" on previously working file/official cookbook example

A file that previously worked fine suddenly started triggering a file format error. To sanity check, I ran the vision fine tuning example from the cookbook

cookbook.openai com/examples/multimodal/vision_fine_tuning_on_gpt4o_for_visual_question_answering,

but still receive the same error. What might be causing this? Again, I’ve been able to successfully train on this data in the past, and I downloaded the official github notebook

github com/openai/openai-cookbook/blob/main/examples/multimodal/Vision_Fine_tuning_on_GPT4o_for_Visual_Question_Answering.ipynb

, which uses HF datasets, and executed it and I am getting the same error.

Model
gpt-4o-2024-08-06
Status
Failed
Error
The job failed due to a file format error in the training file. Error processing images. Please re-upload the file and try again.
Job ID
ftjob-rlqe1kK1u3fSArPXU5P76tE9
Training Method
Supervised
Base model
gpt-4o-2024-08-06
Created at
Jan 9, 2026, 4:32 AM
Data sharing
Private
Trained tokens
-
Hyperparameters
Epochs
auto
Batch size
auto
LR multiplier
auto
Seed
1897123651
Files
Training
ocr-vqa-train.jsonl
Validation
ocr-vqa-validation.jsonl

MessagesMetricsModeration Checks

04:37:32

The job failed due to a file format error in the training file. Error processing images. Please re-upload the file and try again.

04:33:07

Scanning images in file file-UbgAVHAhAVGy49oNigkYbt. This may take some time.

04:33:05

Scanning images in file file-TB6bmeK4vv3Tz4fBzxNZEQ. This may take some time.

04:32:59

Validating training file: file-UbgAVHAhAVGy49oNigkYbt and validation file: file-TB6bmeK4vv3Tz4fBzxNZEQ

04:32:59

Created fine-tuning job: ftjob-rlqe1kK1u3fSArPXU5P76tE9

1 Like

In addition, I created a sample of the data with working image URLs from the example incase there was an issue with the base64 encoding. Below is the data I used (truncated to one record) - same thing, gets stuck in validating files and eventually errors.

Note that I had to put for the url because the editor doesn’t allow links for some reason. But the link I was using was

upload.wikimediaDOTorg/wikipedia/commons/3/36/Danbo_Cheese.jpg (dot is a period - the link comes from the example and I’ve validated that the links are actually accessible )

Click here for sample data

{“messages”: [{“role”: “system”, “content”: [{“type”: “text”, “text”: “\nGenerate an answer to the question based on the image of the book provided.\nQuestions will include both open-ended questions and binary \“yes/no\” questions.\nThe questions will inquire about the title, author, edition, year and genre of the book in the image.\n\nYou will read the question and examine the corresponding image to provide an accurate answer.\n\n# Steps\n\n1. **Read the Question:** Carefully analyze the question to understand what information is being asked.\n2. **Examine the Image:**\n - **Identify Relevant Bounding Boxes (if applicable):** For questions requiring specific details like the title or author, focus on the relevant areas or bounding boxes within the image to extract the necessary text. There may be multiple relevant bounding boxes in the image, so be sure to consider all relevant areas.\n - **Analyze the Whole Image:** For questions that need general reasoning (e.g., \“Is this book related to Children’s Books?\”), consider the entire image, including title, graphics, colors, and overall design elements.\n3. **Formulate a Reasoned Answer:**\n - For binary questions (yes/no), use evidence from the image to support your answer.\n - For open-ended questions, provide the exact text from the image or a concise phrase that best describes the requested information.\n\n# Output Format\n\n- Provide your answer in a concise and clear manner. Always return the final conclusion only, no additional text or reasoning.\n- If the question is binary, answer with \“Yes\” or \“No.\”\n- For open-ended questions requesting specific details (e.g., title, author), return the exact text from the image.\n- For questions about general attributes like \“genre,\” return a single word or phrase that best describes it.\n\n# Notes\n\n- Always prioritize accuracy and clarity in your responses.\n- If multiple authors are listed, return the first author listed.\n- If the information is not present in the image, try to reason about the question using the information you can gather from the image e.g. if the author is not listed, use the title and genre to find the author.\n- Ensure reasoning steps logically lead to the conclusions before stating your final answer.\n\n# Examples\nYou will be provided with examples of questions and corresponding images of book covers, along with the reasoning and conclusion for each example. Use these examples to guide your reasoning process.”}]}, {“role”: “user”, “content”: [{“type”: “text”, “text”: “**Example 1:**\n\n**Question:** Who wrote this book?”}, {“type”: “image_url”, “image_url”: {“url”: }}]}, {“role”: “assistant”, “content”: [{“type”: “text”, “text”: “**Reasoning:** The cover clearly displays two authors’ names, ‘Evelyn M. Thomson’ and ‘Orlen N. Johnson,’ at the bottom of the cover, with Evelyn M. Thomson listed first. Typically, the first-listed author is considered the primary author or main contributor.\n\n**Conclusion:** Evelyn Thomson”}]}, {“role”: “user”, “content”: [{“type”: “text”, “text”: “**Example 2:**\n\n**Question:** What is the title of this book?”}, {“type”: “image_url”, “image_url”: {“url”: }}]}, {“role”: “assistant”, “content”: [{“type”: “text”, “text”: “**Answer:**\n\n**Reasoning:** The cover prominently displays the title across the top and center of the image. The full title reads, ‘Computer Systems: An Integrated Approach to Architecture and Operating Systems,’ with each component of the title clearly separated and formatted to stand out.\n\n**Conclusion:** Computer Systems: An Integrated Approach to Architecture and Operating Systems”}]}, {“role”: “user”, “content”: [{“type”: “text”, “text”: “**Example 3:**\n\n**Question:** Is this book related to Children’s Books?”}, {“type”: “image_url”, “image_url”: {“url”: }}]}, {“role”: “assistant”, “content”: [{“type”: “text”, “text”: “**Answer:**\n\n**Reasoning:** The cover illustration features a whimsical mermaid holding a red shoe, with gentle, child-friendly artwork that suggests it is targeted toward a young audience. Additionally, the style and imagery are typical of children’s literature.\n\n**Conclusion:** Yes”}]}, {“role”: “user”, “content”: [{“type”: “text”, “text”: “**Example 4:**\n\n**Question:** Is this book related to History?”}, {“type”: “image_url”, “image_url”: {“url”: }}]}, {“role”: “assistant”, “content”: [{“type”: “text”, “text”: “**Answer:**\n\n**Reasoning:** The title ‘Oliver Wendell Holmes, Jr.: Civil War Soldier, Supreme Court Justice’ clearly indicates that this book focuses on the life of Oliver Wendell Holmes, Jr., providing a biographical account rather than a general historical analysis. Although it references historical elements (Civil War, Supreme Court), the primary focus is on the individual rather than historical events as a whole.\n\n**Conclusion:** No”}]}, {“role”: “user”, “content”: [{“type”: “text”, “text”: “**Example 5:**\n\n**Question:** What is the genre of this book?”}, {“type”: “image_url”, “image_url”: {“url”: }}]}, {“role”: “assistant”, “content”: [{“type”: “text”, “text”: “**Answer:**\n\n**Reasoning:** The cover prominently features an image of a train station and the title ‘Railway Depots, Stations & Terminals,’ which directly suggests a focus on railway infrastructure. This points to the book being related to topics within Engineering & Transportation.\n\n**Conclusion:** Engineering & Transportation”}]}, {“role”: “user”, “content”: [{“type”: “text”, “text”: “**Example 6:**\n\n**Question:** What type of book is this?”}, {“type”: “image_url”, “image_url”: {“url”: }}]}, {“role”: “assistant”, “content”: [{“type”: “text”, “text”: “**Answer:**\n\n**Reasoning:** The title ‘Principles and Practice of Modern Chromatographic Methods’ suggests a focus on chromatography, a scientific technique used in chemistry and biology. This aligns with the academic and technical nature typical of books in the ‘Science & Math’ category.\n\n**Conclusion:** Science & Math”}]}, {“role”: “user”, “content”: [{“type”: “text”, “text”: “Question [0]: Who wrote this book?”}, {“type”: “image_url”, “image_url”: {“url”: “”}}]}, {“role”: “assistant”, “content”: [{“type”: “text”, “text”: “Andrew Kipple”}]}]}

I tested the same file that has been on the server for months and ran successfully before in testing what models were completely denying images. Copying the training file ID right out of a previous success. A new fine tuning based on gpt-4o to see this issue.

The job failed due to an unsafe training file. Error processing images. Please re-upload the file and try again.

That took three hours.

The file has 10 of the same API calls with one image for vision.

This is a broken machine. Plus not even a “delete” method to get rid of failed jobs, which is the majority in the job listing confirming other’s issues.

2 Likes