GPT4O finetuning with vision capabilities

Hi community,

Does anyone know how to finetune the vision capabilities of gpt-4o? It seems the training data only accepts .jsonl files. Where shoud we put image data?

Thanks!

You can not currently fine-tune the image properties.

3 Likes

not even if you encode the image into a base_64 string and use the in the content along with the file type like: “role”: “user”,
“content”: [
{
“type”: “image_url”,
“image_url”: {“url”: img_dat},
},
{
“type”: “text”,
“text”: “can you turn this invoice into a csv file?”,
},
], where img_dat is the base_64 str