How to compare 2 image simialrity using OPenAI api

How to check 2 image similarity using OPenAI say for example how to check the similarity of 2 printed images photo taken?

Hi @arunantonyholmes

You can include the the images in the API call to the gpt-4-vision-preview model ask it if the images are similar and what category belong to.

3 Likes

Hi @arunantonyholmes ,
depending on what is the purpose behind (e.g. facial recognition, checking for common objects, semantic similarity, etc), I can see two main approaches - you can directly use the GPT-4 Vision model’s API or first convert your images into base64, then use the OpenAI Embedding model, and then compare their vector embeddings using Cosine_similarity. But there might be other approaches depending on the final purpose of your comparison which would be a bit more explained.

1 Like

Hi vasil,

Here I tried using directly using Gpt-4 vision model but it is not giving similarity measure ment but give a text describing image. and also I asked for cosine similarity but won’t give the result. But it gave some text telling you should use some other approach…But what I want is the similarity measurement as you suggest in the second approach

“first convert your images into base64, then use the OpenAI Embedding model, and then compare their vector embeddings using Cosine_similarity.”. Could you explain it in detail?? Is OpenAI Embedding model is also available in Api??

OpenAI has embedding api endpoint, but embedding similarity was based on tokenizer, base64 encoded string similiarity may not be what you expect.

If you really need a numer 0~1 similarity, you may need use vision api to give two image descriptions first, then calculate similiarity based on that two strings embedding.
If you donot need so scientific number result, you may define your similiarity rules with simple example and try ask the model to reply based on the two image it watched, on your instruction prompts.

1 Like

Hi @arunantonyholmes ,
sure, if you ask GPT-4 Vision for cosine similarity it won’g give it to you. For that, you should use Jupyter notebook and some python. But as @pondin6666 suggested, you might not get what you want (I still don’t get what is the use case- what exactly you want to measure the similarity- colors, shapes, or anything else) and then you might want to first make a prompt template so the vision model would return you a standardized response in text (e.g.: people= True, sunshine = False, Bicicles = False, etc), and then embed responses from both images and use cosine similarity. But there’re plenty of potential approaches. Maybe we could give you a bit better suggestion if we would know what exactly is your use case.

I think I’d rather wait for Open AI to provide an embedding option for the vision model than attempt, at potentially huge expense, to create a set of parallel embeddings using a text interpretations only to find that I missed some criterium and have to create all the data again :sweat_smile:

depending on what you are comparing it maybe worth talking to the ai on ways to identify markers using only text that could translate to 0 or 1 as true false. but there is better software currently that does that which you can combine with vision to create many things.

Hi. What is the name of the software?

I use some of these and others I have not tested yet.

OpenCV: An open-source computer vision library that provides a comprehensive suite of tools for image processing, including image comparison. It’s widely used in AI projects for tasks like object detection, facial recognition, and image stitching.

TensorFlow/Keras: While primarily known for deep learning, TensorFlow and its high-level API Keras can be used to train models for image comparison tasks. You can use pre-trained models or design custom architectures for comparing images, such as using Siamese networks for identifying similarities between images.

Pillow (PIL): A Python Imaging Library that is more lightweight compared to OpenCV. It’s suitable for basic image processing tasks, such as resizing, cropping, and simple image comparison based on pixel differences.

Scikit-Image: Part of the broader Scikit-Learn library, Scikit-Image is useful for various image processing tasks, including image segmentation, feature extraction, and comparison.

ImageMagick: A robust command-line tool for image manipulation, including comparison. It’s suitable for batch processing of images and supports a wide range of image formats.

DeepAI Image Comparison API: For those looking for an out-of-the-box solution, this API offers a way to compare images using advanced neural networks, providing detailed differences and similarity scores.

Dlib: Another powerful library often used in conjunction with OpenCV for facial recognition and alignment tasks. Dlib includes tools for image comparison, particularly useful in biometric applications.

Matchering: While not a traditional image comparison tool, this software can be used for matching the overall characteristics (color, exposure, etc.) of images to a reference image, which can be useful in comparison tasks.

Each of these tools has its strengths, depending on the specific requirements of your image comparison task, such as speed, accuracy, ease of use, or the ability to handle large datasets.

Than after using these you can have the AI see results for understanding.

1 Like

The standard solution is to create a image description using a vision model and then create embeddings of the descriptions.

The resulting vectors can be compared as needed.

All of this can be done with the OpenAI APIs.

That works to a degree, but you are limited by what details you ask for and just textual detail. Where using these you can get more details that gpt can’t. Example for cancer research using images you maybe looking for similar contours or color patterns. Gpt vision can’t do exact shades of colors, or exact mathematical shape of contours. So depending on application of what you are doing each application has its use case.The standard solution is to create a image description using a vision model and then create embeddings of the descriptions.

The resulting vectors can be compared as needed.

All of this can be done with the OpenAI APIs.

1 Like

I saw the original question about solutions to this problem from OpenAI and decided to go with the obvious one.

Advanced techniques come with their own challenges and costs. The generalist CV models are a starting point to learn about the limitations that need to be overcome.

3 Likes

Yeah but only one image at the time:

Function to encode the image

def encode_image(image_path):
with open(image_path, “rb”) as image_file:
return base64.b64encode(image_file.read()).decode(‘utf-8’)

Path to your image

image_path = “c:\Users\diego\OneDrive - ITESO\ITESO\Verano 2024\Analisis de Regresion\Images\Image1.png”

Getting the base64 string

base64_image = encode_image(image_path)

headers = {
“Content-Type”: “application/json”,
“Authorization”: f"Bearer {api_key}"
}

payload = {
“model”: “gpt-4o”,
“messages”: [
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “Interpreta los p-values , para saber si rechazamos o no la hipotesis nula de ANOVA. Algo de que alpha1>alpha2>alpha3”
},
{
“type”: “image_url”,
“image_url”: {
“url”: f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
“max_tokens”: 4096
}

response = requests.post(“https://api.openai.com/v1/chat/completions”, headers=headers, json=payload)

Here is a recent example from the OpenAI cookbook to compare two images:

… we directly embed images for similarity search, bypassing the lossy process of text captioning, to boost retrieval accuracy.

Using CLIP-based embeddings further allows fine-tuning with specific data or updating with unseen images.

2 Likes

One image at a time is only a limit of your code. You can parallel batch process everything up to limits per minute for transactions allowing you to process x fps. This goes for almost all inputs even with other tools allowing you to get all results at same time to compare.

2 Likes

I am looking forward for a solution in which two images of the same equipment or location (before and after pictures) to be analyzed for whether the equipment (like a pump or generator) or location (like swimming pool) is cleaner in the after picture compared to the before picture after service is done. Sufficient training images can be provided. Please let me know if there is a solution available.

1 Like

Hi!

Please create a new topic for this question!
I am also interested in a possible solution.

1 Like