Multimodal (image) fine tuning with GPT-4

Hi there. Long time listener, first time caller.

Does the finetuning for GPT4 have the capability for finetune with image inputs.

The project i’m working on is getting good results, but needs the last little push to get to the point where i’m comfortable deploying it.

Cheers!

5 Likes

Welcome to the community @james.maine92

Fine-tuning is currently limited to these models.

3 Likes

Thank you for your response.

I see that gpt-4-0613 is included within this list. As GPT-4 can accept multimodal input, does that extend to fine tuning as well?

Hi James -

while GPT-4 is indeed a multimodal model, fine-tuning with images is currently not supported.


From the OpenAI documentation:

Can I fine-tune the image capabilities in gpt-4?

No, we do not support fine-tuning the image capabilities of gpt-4 at this time.

3 Likes

Multi-modal inputs are accepted by gpt-4 turbo models. The 0613 didn’t come with vision enabled even though they demoed that during the launch.

2 Likes

Not what i wanted to hear but thank you for your response!

1 Like

not what i wanted to hear, but thank you for your help!

Are there any plans for fine-tuning the ChatGPT Vision models?

Technically gpt-4o is now available for fine-tuning. However, just like in the past with gpt-4, you must request access and describe you intended use case via a dedicated form to then maybe get access to it.

Only users with a decent track record of fine-tuning are given the option to request access though. You can check in the fine-tuning UI whether if you are eligible.

Just checking in here, now that gpt-4o is generally available for fine-tuning, is fine-tuning with images supported?

Welcome to the Forum @jonathan.roley!

Fine-tuning with images is still not supported at this point and we have yet to hear from OpenAI when it will become available.

Thank you, yes I confirmed this to be the case.

In case anyone else sees this before trying it, I uploaded a training dataset with image input and it failed with the message The job failed due to an invalid training file. Invalid file format. Please remove all images from your examples and try again.

1 Like

How did you attach images in dataset,have you used Base64 encoding,please help me with dataset

Hi @veerabhadrarao.grand and welcome to the Forum!

Fine-tuning with images is currently not supported.

However, if you are just looking for the API specs for using images in your regular API calls, then you can find the details here: https://platform.openai.com/docs/guides/vision

1 Like

Thanks for the response But i am asking about how did you keep your images in jsonl file for finetuning,

i need that jsonl format for uploading images

Thank you for clarifying. However, as indicated, it is currently not possible to fine-tune with images.

1 Like

Below is an example of the data format I am working with:

{
“messages”: [
{“role”: “user”, “content”: “What does this image represent?”, “image”: “data:image/jpeg;base64,<encoded_image_data>”},
{“role”: “assistant”, “content”: “This is the logout button, which means ‘sign out’.”}
]
}

In this example:

<encoded_image_data> represents the Base64 encoded string of the image.
The intent is to allow the model to process the image alongside the text input, then generate a relevant text-based response.

However, as indicated, it is currently not possible to fine-tune with images :grinning: :grinning:.

1 Like

Since October 1st you can now fine-tune GPT-4o with images, here’s the announcement. https://openai.com/index/introducing-vision-to-the-fine-tuning-api/

2 Likes