Are we able to do this?
for this post:
/t/multimodal-image-fine-tuning-with-gpt-4/723327/3
the answer was no for gpt-4. any change of conditions?
Are we able to do this?
for this post:
/t/multimodal-image-fine-tuning-with-gpt-4/723327/3
the answer was no for gpt-4. any change of conditions?
Not yet. OpenAI is always looking at bringing new features to developers and my understanding is that they would like to enable multimodal fine-tuning and will when it’s ready.
This is a much needed feature
We do need this feature
Maybe this will end the era of Open-VLM
upvote!
Damn I was really hoping the answer would be yes! I want to try and train image to screen coords for mouse clicking
It sounds like you’ve got a WFH job that has implemented some kind of activity tracking and you’re looking to build an even more advanced “mouse jiggler.”
Can we convert image into text?
@shure.alpha do you mean OCR? The answer is yes