Multimodal gpt-4o-mini Fine tuning

Are we able to do this?

for this post:
/t/multimodal-image-fine-tuning-with-gpt-4/723327/3

the answer was no for gpt-4. any change of conditions?

2 Likes

Not yet. OpenAI is always looking at bringing new features to developers and my understanding is that they would like to enable multimodal fine-tuning and will when it’s ready.

1 Like

This is a much needed feature

1 Like

We do need this feature
Maybe this will end the era of Open-VLM
upvote!

1 Like

Damn I was really hoping the answer would be yes! I want to try and train image to screen coords for mouse clicking

2 Likes

:rofl:

It sounds like you’ve got a WFH job that has implemented some kind of activity tracking and you’re looking to build an even more advanced “mouse jiggler.”

Can we convert image into text?