Multimodal (image) fine tuning with GPT-4

james.maine92 · April 20, 2024, 5:39pm

Hi there. Long time listener, first time caller.

Does the finetuning for GPT4 have the capability for finetune with image inputs.

The project i’m working on is getting good results, but needs the last little push to get to the point where i’m comfortable deploying it.

Cheers!

sps · April 20, 2024, 8:07pm

Welcome to the community @james.maine92

Fine-tuning is currently limited to these models.

james.maine92 · April 21, 2024, 9:12am

Thank you for your response.

I see that gpt-4-0613 is included within this list. As GPT-4 can accept multimodal input, does that extend to fine tuning as well?

jr.2509 · April 21, 2024, 9:29am

Hi James -

while GPT-4 is indeed a multimodal model, fine-tuning with images is currently not supported.

From the OpenAI documentation:

Can I fine-tune the image capabilities in `gpt-4`?

No, we do not support fine-tuning the image capabilities of gpt-4 at this time.

sps · April 21, 2024, 10:35am

Multi-modal inputs are accepted by gpt-4 turbo models. The 0613 didn’t come with vision enabled even though they demoed that during the launch.

james.maine92 · April 21, 2024, 3:20pm

Not what i wanted to hear but thank you for your response!

james.maine92 · April 21, 2024, 3:25pm

not what i wanted to hear, but thank you for your help!

sabitoff · June 23, 2024, 12:13am

Are there any plans for fine-tuning the ChatGPT Vision models?

jr.2509 · June 23, 2024, 5:50am

Technically gpt-4o is now available for fine-tuning. However, just like in the past with gpt-4, you must request access and describe you intended use case via a dedicated form to then maybe get access to it.

Only users with a decent track record of fine-tuning are given the option to request access though. You can check in the fine-tuning UI whether if you are eligible.

jonathan.roley · August 30, 2024, 1:12pm

Just checking in here, now that gpt-4o is generally available for fine-tuning, is fine-tuning with images supported?

jr.2509 · August 30, 2024, 1:20pm

Welcome to the Forum @jonathan.roley!

Fine-tuning with images is still not supported at this point and we have yet to hear from OpenAI when it will become available.

jonathan.roley · August 30, 2024, 2:42pm

Thank you, yes I confirmed this to be the case.

In case anyone else sees this before trying it, I uploaded a training dataset with image input and it failed with the message The job failed due to an invalid training file. Invalid file format. Please remove all images from your examples and try again.

veerabhadrarao.grand · September 5, 2024, 6:44am

How did you attach images in dataset,have you used Base64 encoding,please help me with dataset

jr.2509 · September 5, 2024, 7:08am

Hi @veerabhadrarao.grand and welcome to the Forum!

Fine-tuning with images is currently not supported.

However, if you are just looking for the API specs for using images in your regular API calls, then you can find the details here: https://platform.openai.com/docs/guides/vision

veerabhadrarao.grand · September 5, 2024, 7:39am

Thanks for the response But i am asking about how did you keep your images in jsonl file for finetuning,

i need that jsonl format for uploading images

jr.2509 · September 5, 2024, 7:46am

Thank you for clarifying. However, as indicated, it is currently not possible to fine-tune with images.

nphat44444 · September 5, 2024, 9:09am

Below is an example of the data format I am working with:

{
“messages”: [
{“role”: “user”, “content”: “What does this image represent?”, “image”: “data:image/jpeg;base64,<encoded_image_data>”},
{“role”: “assistant”, “content”: “This is the logout button, which means ‘sign out’.”}
]
}

In this example:

<encoded_image_data> represents the Base64 encoded string of the image.
The intent is to allow the model to process the image alongside the text input, then generate a relevant text-based response.

However, as indicated, it is currently not possible to fine-tune with images .

jonathan.roley · October 3, 2024, 1:22pm

Since October 1st you can now fine-tune GPT-4o with images, here’s the announcement. https://openai.com/index/introducing-vision-to-the-fine-tuning-api/

Topic		Replies	Views
Fine-tuning gpt-4o on image data API fine-tuning , fine-tune	9	1146	November 29, 2024
Fine-tuning gpt-4o-2024-08-06 with images? API fine-tuning	2	1483	October 3, 2024
Question on Finetuning: Can you hardcode images or upload image responses via the image_url subkey of content? API	3	215	August 27, 2024
Impact of Fine-Tuning GPT-4o on Multi-Modal Capabilities API gpt-4	2	274	May 1, 2025
Can I use images with fine-tuned model API image-reading , gpt-4o-mini	4	206	October 10, 2024

Multimodal (image) fine tuning with GPT-4

Can I fine-tune the image capabilities in gpt-4?

Related topics

Can I fine-tune the image capabilities in `gpt-4`?