I give 5 images to gpt4-vision and need to identify 2 similar images?

zainsheikh · December 6, 2023, 4:31pm

I have a set of images.
I want to separate the common images after the identification from vision.

How can I do that. It does not accept any meta data or name according to my knowledge. Then how can I achieve that?

sps · December 6, 2023, 8:06pm

Welcome to the dev forum, @zainsheikh!

You can try by prompting the model to return a JSON object with indices of the matches.

Also, beware that there’s a safety mechanism built-in which prevents the abuse of the model for bypassing captchas.

zainsheikh · December 6, 2023, 8:26pm

I will try that. But how would I know which image relates which json object?

anon10827405 · December 6, 2023, 8:28pm

Similar in what way? Could you not just create embeddings of the images and then compare them that way?

zainsheikh · December 6, 2023, 8:30pm

E.g there are total 5 images . And 2 of them are related to the same product say front and back side of the same cloth. Now it can reply that the first two images are similar but how would I know that what images are they referring to? Are you getting my point?

anon10827405 · December 6, 2023, 8:36pm

Yes. You can accomplish the same effect by simply embedding the images yourself.

But if you must use GPT-4V you could contain each product as 512x512px and then apply a label to each one for GPT to refer to.

zainsheikh · December 6, 2023, 8:48pm

Just one thing. What do you mean by applying label. Giving meta data for vector storage?

goony · January 3, 2024, 3:08pm

You can get the AI to identify the images by their index in the JSON you provided. In my experience, it works well up to 5-6 images but after that it bugs and messes up the order or skips some indexes. Just ask in the prompt to tell you which images are similar by their indexes. There is no way to provide an ID with each image as of today, which is a shame in my opinion and I hope this feature gets added soon.

supershaneski · January 5, 2024, 7:13am

You already have the answer from the response:
"the first two images are similar"

But if you want to further extract the specific number, assuming you have access to the array of images you sent to gpt-4V, run the result in another chat completions API with response format in JSON mode, using a system prompt like this:

You are a helpful assistant designed to output JSON.  
You will help extract the indices of items in an array based on the ordinal numbers mentioned in a text. 
# example
If you provide me with a text like this: 
"There are 5 images in total. The first two images are similar.", 
you will understand that it refers to the indices 0 and 1 in an array (since we start counting from zero).
# output JSON format.
{ images: [0, 1] }

Sample GPT-4V:

Running the post processing…

You’ll end up with:

{ "images": [1, 4] }

rsomani95 · January 17, 2024, 3:10pm

This may work well for 4-5 images, or some small number. But what if you want to pass 400 images – this wouldn’t be reliable at all and prone to hallucinations no?

supershaneski · January 17, 2024, 11:35pm

I tested your question if it indeed can handle such case. Let say I got the following response from GPT-4V:

A blue dot is found in the following images, from 102nd to 128th, 146th, 138th and from 251st to 286th images out of 400 images submitted.

Using conversion prompt, we get:

{ “images”: [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 145, 137, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286] }

Please note that in ordinal numbers, we add one to the index because the array indices start at 0.

rsomani95 · January 18, 2024, 12:53am

Very interesting!

I’m curious if you verified whether the returned indices were correct and not hallucinated?
I don’t know the specific images you’re testing it with, but even if it was ~80% accurate then that’s quite useful in my particular case.

Topic		Replies	Views
How to best work with 100s of images API gpt-4	0	1421	January 17, 2024
How to identify photos when batching for gpt 4 vision API	3	1506	March 18, 2024
Referring to multiple images in vision API API gpt-4	7	4153	October 26, 2024
How to compare 2 image simialrity using OPenAI api API gpt-4 , api	17	20226	October 8, 2024
Why does using API it doesn't describe all images? API gpt-4o	8	541	May 15, 2024

I give 5 images to gpt4-vision and need to identify 2 similar images?

Related topics