Fine-tuning gpt-4o-2024-08-06 with images?

I am trying to extract complex data from images. Normally, I include examples in my API request, but that takes a lot of tokens.

I am now trying to fine-tune the gpt-4o-2024-08-06, giving examples of how to answer those images:

jsonl file:
{“messages”: [{“role”: “system”, “content”: “My explenation”}, {“role”: “user”, “content”: [{“type”: “image_url”, “image_url”: {“url”: “data:image/jpg;base64, base64_string”}}, {“type”: “text”, “text”: “Do that with this picture”}]}, {“role”: “assistant”, “content”: “Correct answer”}]}
{“messages”: [{“role”: “system”, “content”: “My explenation”}, {“role”: “user”, “content”: [{“type”: “image_url”, “image_url”: {“url”: “data:image/jpg;base64, base64_string”}}, {“type”: “text”, “text”: “Do that with this picture”}]}, {“role”: “assistant”, “content”: “Correct answer”}]}

But I get this error: “The job failed due to an invalid training file. Invalid file format. Please remove all mentions of ‘image_url’ from your file and try again.”

Is it even possible to fine-tune a model with images? What am I doing wrong here?

Hi @Rednas07

As of writing this, fine-tuning gpt-4o-mini is only for text inputs and outputs.

2 Likes

@Rednas07 this functionality is now available as of October 1st. Here’s the announcement. https://openai.com/index/introducing-vision-to-the-fine-tuning-api/

1 Like