I want to train a large vision model (gpt-4 vision preview) to classify a database of images based on hand-generated labels. Is it possible to do this?
I am looking to call gpt-4 vision preview from the API and train it to classify images based on a dataset. Is is possible to do this?
Hello, you can not currently fine-tune GPT-4-Vision on your own images.
Hi Elijah, I am building an LLM application with an image classifier. Based on this constraint, it may just be better to build a image classification model, and access it using a function call?
If it is just an image classifier, then why do you want to custom-train it?
GPT-4-Vision might be fine for your use case in its base model.
The gpt-4 vision model gets the labels wrong frequently.
Then I would recommend you to use an open-source model that you can train specifically for your use case.
I see, I am looking to train the model to classify images of molecular orbitals like these ones:
Which open-source models should I use? How long would it take to train a trial run with a dataset of 100 or so hand-labelled images?
I am not very familiar with vision models, but I would recommend using a model like LLaVA on a custom dataset. I hope this answers your question.
For your reference, if you’re looking to perform an image classification task, you might find something useful in the following URL:
you can use a pre-trained ResNet model or train one from scratch, depending on the size of your dataset. Many deep learning frameworks like TensorFlow and PyTorch provide pre-trained ResNet models that you can fine-tune on your specific dataset which for your case is to classify images of molecular orbitals
-Given the limited size of your dataset, it’s essential to use techniques like data augmentation to artificially increase the effective size of your dataset and help the model generalize better.
-training time for a small dataset on a modern GPU might range from a few minutes to a couple of hours for a simple ResNet architecture.
@dignity_for_all I’ll add to that as well, Hugging Face has a service called “Autotrain” where you can easily train a classification model with minimal effort.
It worked quite well for me when I trained it to distinguishing between different types of vehicle images. (VIN Photo, ID shot, Mileage photo, etc.)
I am curating the dataset for the training, using a script to generate images of orbitals that I want the vision model to comprehend and classify. This is a sample image:
Should I generate many perspectives of the orbitals in the dataset?
What kind of classification are you trying to do? Can you elaborate on the number of categories?
Hi, sps. I was looking to label orbitals based on their chemical nature. A simple classification scheme for diatomic molecules would have four labels: sigma, sigma*, pi, and pi*. A more complicated classification would involve recognition of molecular orbitals that would comprise the a subset of orbitals (active space) that are strongly correlated.
hey, how its going your project. im into training a gpt with hand writed letters and i just saw your entry and wondered