My GPT is not reading images well enough

Hi there,

Disclaimer before I start: I’m not an engineer!

I’m trying to create a GPT that could read an image and count the number of adhesive tape pieces there’s on the back of a bath shelf.

I have input him a data base of 20 images with the number of tape pieces for the GPT to learn.

It works ok when the tape pieces are clearly apart.

But not so well when they are closer (exemple below)


In this example the GPT counted 4 tape pieces probably thinking the upper and lower tape pieces count for one although there are two pieces.

Any idea on how I could improve the GPT performance? Some guidances on an efficient prompt maybe?

thanks!

Did you tried to teach it with that image above that has 2 one next to another?

For your information this is a section of the database I fed it with (second column teaches the GPT the number of pieces of tape it should identify) :

So to your question, yes, in row 4 this is a section of the whole picture, and it was still not able to identify it as two pieces of tape.

1 Like

Even though the model is capable of distinguishing objects well, it doesn’t actually see them physically as humans do. Even humans, who physically see, might perceive something as four pieces if they haven’t read what you wrote or don’t have sharp vision. Generally, the model operates based on its fundamental capabilities unless given specific instructions.

The term ‘observation’ for humans involves a single method of seeing combined with cognitive processing of what is seen. However, the model uses different methods of ‘seeing,’ such as edge detection, and then processes what it perceives.

You might add instructions to guide responses in various situations, compensating for the lack of specific training data input.

Current models like ChatGPT are intelligent enough to interpret words as actions that compensate for necessary information, but they are not smart enough to independently determine what to do to correct or solve an issue.