How to compare 2 image simialrity using OPenAI api

arunantonyholmes · March 28, 2024, 7:31pm

How to check 2 image similarity using OPenAI say for example how to check the similarity of 2 printed images photo taken?

sps · March 29, 2024, 4:24am

Hi @arunantonyholmes

You can include the the images in the API call to the gpt-4-vision-preview model ask it if the images are similar and what category belong to.

vasyl · March 29, 2024, 4:32am

Hi @arunantonyholmes ,
depending on what is the purpose behind (e.g. facial recognition, checking for common objects, semantic similarity, etc), I can see two main approaches - you can directly use the GPT-4 Vision model’s API or first convert your images into base64, then use the OpenAI Embedding model, and then compare their vector embeddings using Cosine_similarity. But there might be other approaches depending on the final purpose of your comparison which would be a bit more explained.

arunantonyholmes · March 29, 2024, 5:25am

Hi vasil,

Here I tried using directly using Gpt-4 vision model but it is not giving similarity measure ment but give a text describing image. and also I asked for cosine similarity but won’t give the result. But it gave some text telling you should use some other approach…But what I want is the similarity measurement as you suggest in the second approach

“first convert your images into base64, then use the OpenAI Embedding model, and then compare their vector embeddings using Cosine_similarity.”. Could you explain it in detail?? Is OpenAI Embedding model is also available in Api??

pondin6666 · March 29, 2024, 5:53am

OpenAI has embedding api endpoint, but embedding similarity was based on tokenizer, base64 encoded string similiarity may not be what you expect.

If you really need a numer 0~1 similarity, you may need use vision api to give two image descriptions first, then calculate similiarity based on that two strings embedding.
If you donot need so scientific number result, you may define your similiarity rules with simple example and try ask the model to reply based on the two image it watched, on your instruction prompts.

vasyl · March 31, 2024, 5:56am

Hi @arunantonyholmes ,
sure, if you ask GPT-4 Vision for cosine similarity it won’g give it to you. For that, you should use Jupyter notebook and some python. But as @pondin6666 suggested, you might not get what you want (I still don’t get what is the use case- what exactly you want to measure the similarity- colors, shapes, or anything else) and then you might want to first make a prompt template so the vision model would return you a standardized response in text (e.g.: people= True, sunshine = False, Bicicles = False, etc), and then embed responses from both images and use cosine similarity. But there’re plenty of potential approaches. Maybe we could give you a bit better suggestion if we would know what exactly is your use case.

merefield · March 31, 2024, 8:43am

I think I’d rather wait for Open AI to provide an embedding option for the vision model than attempt, at potentially huge expense, to create a set of parallel embeddings using a text interpretations only to find that I missed some criterium and have to create all the data again

darcschnider · April 3, 2024, 3:28pm

depending on what you are comparing it maybe worth talking to the ai on ways to identify markers using only text that could translate to 0 or 1 as true false. but there is better software currently that does that which you can combine with vision to create many things.

ecsouzas369 · July 31, 2024, 12:27pm

Hi. What is the name of the software?

darcschnider · July 31, 2024, 1:03pm

I use some of these and others I have not tested yet.

OpenCV: An open-source computer vision library that provides a comprehensive suite of tools for image processing, including image comparison. It’s widely used in AI projects for tasks like object detection, facial recognition, and image stitching.

TensorFlow/Keras: While primarily known for deep learning, TensorFlow and its high-level API Keras can be used to train models for image comparison tasks. You can use pre-trained models or design custom architectures for comparing images, such as using Siamese networks for identifying similarities between images.

Pillow (PIL): A Python Imaging Library that is more lightweight compared to OpenCV. It’s suitable for basic image processing tasks, such as resizing, cropping, and simple image comparison based on pixel differences.

Scikit-Image: Part of the broader Scikit-Learn library, Scikit-Image is useful for various image processing tasks, including image segmentation, feature extraction, and comparison.

ImageMagick: A robust command-line tool for image manipulation, including comparison. It’s suitable for batch processing of images and supports a wide range of image formats.

DeepAI Image Comparison API: For those looking for an out-of-the-box solution, this API offers a way to compare images using advanced neural networks, providing detailed differences and similarity scores.

Dlib: Another powerful library often used in conjunction with OpenCV for facial recognition and alignment tasks. Dlib includes tools for image comparison, particularly useful in biometric applications.

Matchering: While not a traditional image comparison tool, this software can be used for matching the overall characteristics (color, exposure, etc.) of images to a reference image, which can be useful in comparison tasks.

Each of these tools has its strengths, depending on the specific requirements of your image comparison task, such as speed, accuracy, ease of use, or the ability to handle large datasets.

Than after using these you can have the AI see results for understanding.

vb · August 1, 2024, 7:45am

The standard solution is to create a image description using a vision model and then create embeddings of the descriptions.

The resulting vectors can be compared as needed.

All of this can be done with the OpenAI APIs.

darcschnider · August 7, 2024, 10:36pm

That works to a degree, but you are limited by what details you ask for and just textual detail. Where using these you can get more details that gpt can’t. Example for cancer research using images you maybe looking for similar contours or color patterns. Gpt vision can’t do exact shades of colors, or exact mathematical shape of contours. So depending on application of what you are doing each application has its use case.The standard solution is to create a image description using a vision model and then create embeddings of the descriptions.

The resulting vectors can be compared as needed.

All of this can be done with the OpenAI APIs.

vb · August 8, 2024, 3:34pm

I saw the original question about solutions to this problem from OpenAI and decided to go with the obvious one.

Advanced techniques come with their own challenges and costs. The generalist CV models are a starting point to learn about the limitations that need to be overcome.

mastergun27 · August 8, 2024, 11:49pm

Yeah but only one image at the time:

Function to encode the image

def encode_image(image_path):
with open(image_path, “rb”) as image_file:
return base64.b64encode(image_file.read()).decode(‘utf-8’)

Path to your image

image_path = “c:\Users\diego\OneDrive - ITESO\ITESO\Verano 2024\Analisis de Regresion\Images\Image1.png”

Getting the base64 string

base64_image = encode_image(image_path)

headers = {
“Content-Type”: “application/json”,
“Authorization”: f"Bearer {api_key}"
}

payload = {
“model”: “gpt-4o”,
“messages”: [
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “Interpreta los p-values , para saber si rechazamos o no la hipotesis nula de ANOVA. Algo de que alpha1>alpha2>alpha3”
},
{
“type”: “image_url”,
“image_url”: {
“url”: f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
“max_tokens”: 4096
}

response = requests.post(“https://api.openai.com/v1/chat/completions”, headers=headers, json=payload)

vb · August 9, 2024, 6:25pm

Here is a recent example from the OpenAI cookbook to compare two images:

… we directly embed images for similarity search, bypassing the lossy process of text captioning, to boost retrieval accuracy.

Using CLIP-based embeddings further allows fine-tuning with specific data or updating with unseen images.

darcschnider · August 10, 2024, 9:25am

One image at a time is only a limit of your code. You can parallel batch process everything up to limits per minute for transactions allowing you to process x fps. This goes for almost all inputs even with other tools allowing you to get all results at same time to compare.

JobTholath · October 8, 2024, 3:34pm

I am looking forward for a solution in which two images of the same equipment or location (before and after pictures) to be analyzed for whether the equipment (like a pump or generator) or location (like swimming pool) is cleaner in the after picture compared to the before picture after service is done. Sufficient training images can be provided. Please let me know if there is a solution available.

vb · October 8, 2024, 3:50pm

Hi!

Please create a new topic for this question!
I am also interested in a possible solution.

Topic		Replies	Views
Comparing 2 images for analysis API	3	1571	October 8, 2024
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3873	December 6, 2023
I give 5 images to gpt4-vision and need to identify 2 similar images? API gpt-4-vision	11	5789	January 18, 2024
Is it possible to do visual comparison with OpenAI Vision? API gpt-4-vision , api-vision	0	39	May 28, 2025
Can GPT -vision models be accessed using API? API	15	1480	January 7, 2025

How to compare 2 image simialrity using OPenAI api

Function to encode the image

Path to your image

Getting the base64 string

Related topics