Can a train a gpt-4 vision model on a dataset of images?

I want to train a large vision model (gpt-4 vision preview) to classify a database of images based on hand-generated labels. Is it possible to do this?

1 Like

I am looking to call gpt-4 vision preview from the API and train it to classify images based on a dataset. Is is possible to do this?

1 Like

Hello, you can not currently fine-tune GPT-4-Vision on your own images.

1 Like

Hi Elijah, I am building an LLM application with an image classifier. Based on this constraint, it may just be better to build a image classification model, and access it using a function call?

1 Like

If it is just an image classifier, then why do you want to custom-train it?

GPT-4-Vision might be fine for your use case in its base model.

1 Like

The gpt-4 vision model gets the labels wrong frequently.

1 Like

Then I would recommend you to use an open-source model that you can train specifically for your use case.

2 Likes

I see, I am looking to train the model to classify images of molecular orbitals like these ones:

Which open-source models should I use? How long would it take to train a trial run with a dataset of 100 or so hand-labelled images?

1 Like

I am not very familiar with vision models, but I would recommend using a model like LLaVA on a custom dataset. I hope this answers your question.

1 Like

For your reference, if you’re looking to perform an image classification task, you might find something useful in the following URL:

4 Likes

you can use a pre-trained ResNet model or train one from scratch, depending on the size of your dataset. Many deep learning frameworks like TensorFlow and PyTorch provide pre-trained ResNet models that you can fine-tune on your specific dataset which for your case is to classify images of molecular orbitals

-Given the limited size of your dataset, it’s essential to use techniques like data augmentation to artificially increase the effective size of your dataset and help the model generalize better.
-training time for a small dataset on a modern GPU might range from a few minutes to a couple of hours for a simple ResNet architecture.

2 Likes

@dignity_for_all I’ll add to that as well, Hugging Face has a service called “Autotrain” where you can easily train a classification model with minimal effort.

It worked quite well for me when I trained it to distinguishing between different types of vehicle images. (VIN Photo, ID shot, Mileage photo, etc.)

AutoTrain – Hugging Face

2 Likes