I want to train a large vision model (gpt-4 vision preview) to classify a database of images based on hand-generated labels. Is it possible to do this?
I am looking to call gpt-4 vision preview from the API and train it to classify images based on a dataset. Is is possible to do this?
Hello, you can not currently fine-tune GPT-4-Vision on your own images.
Hi Elijah, I am building an LLM application with an image classifier. Based on this constraint, it may just be better to build a image classification model, and access it using a function call?
If it is just an image classifier, then why do you want to custom-train it?
GPT-4-Vision might be fine for your use case in its base model.
The gpt-4 vision model gets the labels wrong frequently.
Then I would recommend you to use an open-source model that you can train specifically for your use case.
I see, I am looking to train the model to classify images of molecular orbitals like these ones:
Which open-source models should I use? How long would it take to train a trial run with a dataset of 100 or so hand-labelled images?
I am not very familiar with vision models, but I would recommend using a model like LLaVA on a custom dataset. I hope this answers your question.
For your reference, if you’re looking to perform an image classification task, you might find something useful in the following URL:
you can use a pre-trained ResNet model or train one from scratch, depending on the size of your dataset. Many deep learning frameworks like TensorFlow and PyTorch provide pre-trained ResNet models that you can fine-tune on your specific dataset which for your case is to classify images of molecular orbitals
-Given the limited size of your dataset, it’s essential to use techniques like data augmentation to artificially increase the effective size of your dataset and help the model generalize better.
-training time for a small dataset on a modern GPU might range from a few minutes to a couple of hours for a simple ResNet architecture.
@dignity_for_all I’ll add to that as well, Hugging Face has a service called “Autotrain” where you can easily train a classification model with minimal effort.
It worked quite well for me when I trained it to distinguishing between different types of vehicle images. (VIN Photo, ID shot, Mileage photo, etc.)