Can a train a gpt-4 vision model on a dataset of images?

I want to train a large vision model (gpt-4 vision preview) to classify a database of images based on hand-generated labels. Is it possible to do this?

1 Like

I am looking to call gpt-4 vision preview from the API and train it to classify images based on a dataset. Is is possible to do this?

1 Like

Hello, you can not currently fine-tune GPT-4-Vision on your own images.

1 Like

Hi Elijah, I am building an LLM application with an image classifier. Based on this constraint, it may just be better to build a image classification model, and access it using a function call?

1 Like

If it is just an image classifier, then why do you want to custom-train it?

GPT-4-Vision might be fine for your use case in its base model.

1 Like

The gpt-4 vision model gets the labels wrong frequently.

1 Like

Then I would recommend you to use an open-source model that you can train specifically for your use case.

2 Likes

I see, I am looking to train the model to classify images of molecular orbitals like these ones:

Which open-source models should I use? How long would it take to train a trial run with a dataset of 100 or so hand-labelled images?

1 Like

I am not very familiar with vision models, but I would recommend using a model like LLaVA on a custom dataset. I hope this answers your question.

1 Like

For your reference, if you’re looking to perform an image classification task, you might find something useful in the following URL:

4 Likes

you can use a pre-trained ResNet model or train one from scratch, depending on the size of your dataset. Many deep learning frameworks like TensorFlow and PyTorch provide pre-trained ResNet models that you can fine-tune on your specific dataset which for your case is to classify images of molecular orbitals

-Given the limited size of your dataset, it’s essential to use techniques like data augmentation to artificially increase the effective size of your dataset and help the model generalize better.
-training time for a small dataset on a modern GPU might range from a few minutes to a couple of hours for a simple ResNet architecture.

2 Likes

@dignity_for_all I’ll add to that as well, Hugging Face has a service called “Autotrain” where you can easily train a classification model with minimal effort.

It worked quite well for me when I trained it to distinguishing between different types of vehicle images. (VIN Photo, ID shot, Mileage photo, etc.)

AutoTrain – Hugging Face

2 Likes

I am curating the dataset for the training, using a script to generate images of orbitals that I want the vision model to comprehend and classify. This is a sample image:
image
Should I generate many perspectives of the orbitals in the dataset?

What kind of classification are you trying to do? Can you elaborate on the number of categories?

1 Like

Hi, sps. I was looking to label orbitals based on their chemical nature. A simple classification scheme for diatomic molecules would have four labels: sigma, sigma*, pi, and pi*. A more complicated classification would involve recognition of molecular orbitals that would comprise the a subset of orbitals (active space) that are strongly correlated.

2 Likes

hey, how its going your project. im into training a gpt with hand writed letters and i just saw your entry and wondered