How to compare 2 image simialrity using OPenAI api

How to check 2 image similarity using OPenAI say for example how to check the similarity of 2 printed images photo taken?

Hi @arunantonyholmes

You can include the the images in the API call to the gpt-4-vision-preview model ask it if the images are similar and what category belong to.


Hi @arunantonyholmes ,
depending on what is the purpose behind (e.g. facial recognition, checking for common objects, semantic similarity, etc), I can see two main approaches - you can directly use the GPT-4 Vision model’s API or first convert your images into base64, then use the OpenAI Embedding model, and then compare their vector embeddings using Cosine_similarity. But there might be other approaches depending on the final purpose of your comparison which would be a bit more explained.

1 Like

Hi vasil,

Here I tried using directly using Gpt-4 vision model but it is not giving similarity measure ment but give a text describing image. and also I asked for cosine similarity but won’t give the result. But it gave some text telling you should use some other approach…But what I want is the similarity measurement as you suggest in the second approach

“first convert your images into base64, then use the OpenAI Embedding model, and then compare their vector embeddings using Cosine_similarity.”. Could you explain it in detail?? Is OpenAI Embedding model is also available in Api??

OpenAI has embedding api endpoint, but embedding similarity was based on tokenizer, base64 encoded string similiarity may not be what you expect.

If you really need a numer 0~1 similarity, you may need use vision api to give two image descriptions first, then calculate similiarity based on that two strings embedding.
If you donot need so scientific number result, you may define your similiarity rules with simple example and try ask the model to reply based on the two image it watched, on your instruction prompts.

1 Like

Hi @arunantonyholmes ,
sure, if you ask GPT-4 Vision for cosine similarity it won’g give it to you. For that, you should use Jupyter notebook and some python. But as @pondin6666 suggested, you might not get what you want (I still don’t get what is the use case- what exactly you want to measure the similarity- colors, shapes, or anything else) and then you might want to first make a prompt template so the vision model would return you a standardized response in text (e.g.: people= True, sunshine = False, Bicicles = False, etc), and then embed responses from both images and use cosine similarity. But there’re plenty of potential approaches. Maybe we could give you a bit better suggestion if we would know what exactly is your use case.

I think I’d rather wait for Open AI to provide an embedding option for the vision model than attempt, at potentially huge expense, to create a set of parallel embeddings using a text interpretations only to find that I missed some criterium and have to create all the data again :sweat_smile:

depending on what you are comparing it maybe worth talking to the ai on ways to identify markers using only text that could translate to 0 or 1 as true false. but there is better software currently that does that which you can combine with vision to create many things.