Can a train a gpt-4 vision model on a dataset of images?

romit.chakraborty · February 8, 2024, 5:43pm

I want to train a large vision model (gpt-4 vision preview) to classify a database of images based on hand-generated labels. Is it possible to do this?

romit.chakraborty · February 8, 2024, 5:45pm

I am looking to call gpt-4 vision preview from the API and train it to classify images based on a dataset. Is is possible to do this?

grandell1234 · February 8, 2024, 5:46pm

Hello, you can not currently fine-tune GPT-4-Vision on your own images.

romit.chakraborty · February 8, 2024, 5:48pm

Hi Elijah, I am building an LLM application with an image classifier. Based on this constraint, it may just be better to build a image classification model, and access it using a function call?

grandell1234 · February 8, 2024, 5:49pm

If it is just an image classifier, then why do you want to custom-train it?

GPT-4-Vision might be fine for your use case in its base model.

romit.chakraborty · February 8, 2024, 5:50pm

The gpt-4 vision model gets the labels wrong frequently.

grandell1234 · February 8, 2024, 5:51pm

Then I would recommend you to use an open-source model that you can train specifically for your use case.

romit.chakraborty · February 8, 2024, 5:54pm

I see, I am looking to train the model to classify images of molecular orbitals like these ones:

Which open-source models should I use? How long would it take to train a trial run with a dataset of 100 or so hand-labelled images?

grandell1234 · February 8, 2024, 5:58pm

I am not very familiar with vision models, but I would recommend using a model like LLaVA on a custom dataset. I hope this answers your question.

dignity_for_all · February 8, 2024, 6:46pm

For your reference, if you’re looking to perform an image classification task, you might find something useful in the following URL:

shankar138089 · February 9, 2024, 6:56am

you can use a pre-trained ResNet model or train one from scratch, depending on the size of your dataset. Many deep learning frameworks like TensorFlow and PyTorch provide pre-trained ResNet models that you can fine-tune on your specific dataset which for your case is to classify images of molecular orbitals

-Given the limited size of your dataset, it’s essential to use techniques like data augmentation to artificially increase the effective size of your dataset and help the model generalize better.
-training time for a small dataset on a modern GPU might range from a few minutes to a couple of hours for a simple ResNet architecture.

trenton.dambrowitz · February 9, 2024, 9:48am

@dignity_for_all I’ll add to that as well, Hugging Face has a service called “Autotrain” where you can easily train a classification model with minimal effort.

It worked quite well for me when I trained it to distinguishing between different types of vehicle images. (VIN Photo, ID shot, Mileage photo, etc.)

AutoTrain – Hugging Face

romit.chakraborty · March 13, 2024, 5:49pm

I am curating the dataset for the training, using a script to generate images of orbitals that I want the vision model to comprehend and classify. This is a sample image:

Should I generate many perspectives of the orbitals in the dataset?

sps · March 13, 2024, 6:32pm

What kind of classification are you trying to do? Can you elaborate on the number of categories?

romit.chakraborty · March 13, 2024, 7:20pm

Hi, sps. I was looking to label orbitals based on their chemical nature. A simple classification scheme for diatomic molecules would have four labels: sigma, sigma*, pi, and pi*. A more complicated classification would involve recognition of molecular orbitals that would comprise the a subset of orbitals (active space) that are strongly correlated.

yagizalanli · June 3, 2024, 4:25pm

hey, how its going your project. im into training a gpt with hand writed letters and i just saw your entry and wondered

Topic		Replies	Views
GPT4O finetuning with vision capabilities API	2	937	July 24, 2024
GPT-4 API for image input API gpt-4 , api	3	2710	November 6, 2023
Is there plans to allow for training GPT4-Vision API gpt-4-vision	1	842	November 15, 2023
Can GPT -vision models be accessed using API? API	9	836	June 26, 2024
Fine-tuned model on GPT 4o-mini can't use vision API fine-tuning	1	201	July 26, 2024

Can a train a gpt-4 vision model on a dataset of images?

Related topics