How to check 2 image similarity using OPenAI say for example how to check the similarity of 2 printed images photo taken?
You can include the the images in the API call to the gpt-4-vision-preview
model ask it if the images are similar and what category belong to.
Hi @arunantonyholmes ,
depending on what is the purpose behind (e.g. facial recognition, checking for common objects, semantic similarity, etc), I can see two main approaches - you can directly use the GPT-4 Vision modelâs API or first convert your images into base64, then use the OpenAI Embedding model, and then compare their vector embeddings using Cosine_similarity. But there might be other approaches depending on the final purpose of your comparison which would be a bit more explained.
Hi vasil,
Here I tried using directly using Gpt-4 vision model but it is not giving similarity measure ment but give a text describing image. and also I asked for cosine similarity but wonât give the result. But it gave some text telling you should use some other approachâŚBut what I want is the similarity measurement as you suggest in the second approach
âfirst convert your images into base64, then use the OpenAI Embedding model, and then compare their vector embeddings using Cosine_similarity.â. Could you explain it in detail?? Is OpenAI Embedding model is also available in Api??
OpenAI has embedding api endpoint, but embedding similarity was based on tokenizer, base64 encoded string similiarity may not be what you expect.
If you really need a numer 0~1 similarity, you may need use vision api to give two image descriptions first, then calculate similiarity based on that two strings embedding.
If you donot need so scientific number result, you may define your similiarity rules with simple example and try ask the model to reply based on the two image it watched, on your instruction prompts.
Hi @arunantonyholmes ,
sure, if you ask GPT-4 Vision for cosine similarity it wonâg give it to you. For that, you should use Jupyter notebook and some python. But as @pondin6666 suggested, you might not get what you want (I still donât get what is the use case- what exactly you want to measure the similarity- colors, shapes, or anything else) and then you might want to first make a prompt template so the vision model would return you a standardized response in text (e.g.: people= True, sunshine = False, Bicicles = False, etc), and then embed responses from both images and use cosine similarity. But thereâre plenty of potential approaches. Maybe we could give you a bit better suggestion if we would know what exactly is your use case.
I think Iâd rather wait for Open AI to provide an embedding option for the vision model than attempt, at potentially huge expense, to create a set of parallel embeddings using a text interpretations only to find that I missed some criterium and have to create all the data again
depending on what you are comparing it maybe worth talking to the ai on ways to identify markers using only text that could translate to 0 or 1 as true false. but there is better software currently that does that which you can combine with vision to create many things.
Hi. What is the name of the software?
I use some of these and others I have not tested yet.
OpenCV: An open-source computer vision library that provides a comprehensive suite of tools for image processing, including image comparison. Itâs widely used in AI projects for tasks like object detection, facial recognition, and image stitching.
TensorFlow/Keras: While primarily known for deep learning, TensorFlow and its high-level API Keras can be used to train models for image comparison tasks. You can use pre-trained models or design custom architectures for comparing images, such as using Siamese networks for identifying similarities between images.
Pillow (PIL): A Python Imaging Library that is more lightweight compared to OpenCV. Itâs suitable for basic image processing tasks, such as resizing, cropping, and simple image comparison based on pixel differences.
Scikit-Image: Part of the broader Scikit-Learn library, Scikit-Image is useful for various image processing tasks, including image segmentation, feature extraction, and comparison.
ImageMagick: A robust command-line tool for image manipulation, including comparison. Itâs suitable for batch processing of images and supports a wide range of image formats.
DeepAI Image Comparison API: For those looking for an out-of-the-box solution, this API offers a way to compare images using advanced neural networks, providing detailed differences and similarity scores.
Dlib: Another powerful library often used in conjunction with OpenCV for facial recognition and alignment tasks. Dlib includes tools for image comparison, particularly useful in biometric applications.
Matchering: While not a traditional image comparison tool, this software can be used for matching the overall characteristics (color, exposure, etc.) of images to a reference image, which can be useful in comparison tasks.
Each of these tools has its strengths, depending on the specific requirements of your image comparison task, such as speed, accuracy, ease of use, or the ability to handle large datasets.
Than after using these you can have the AI see results for understanding.
The standard solution is to create a image description using a vision model and then create embeddings of the descriptions.
The resulting vectors can be compared as needed.
All of this can be done with the OpenAI APIs.
That works to a degree, but you are limited by what details you ask for and just textual detail. Where using these you can get more details that gpt canât. Example for cancer research using images you maybe looking for similar contours or color patterns. Gpt vision canât do exact shades of colors, or exact mathematical shape of contours. So depending on application of what you are doing each application has its use case.The standard solution is to create a image description using a vision model and then create embeddings of the descriptions.
The resulting vectors can be compared as needed.
All of this can be done with the OpenAI APIs.
I saw the original question about solutions to this problem from OpenAI and decided to go with the obvious one.
Advanced techniques come with their own challenges and costs. The generalist CV models are a starting point to learn about the limitations that need to be overcome.
Yeah but only one image at the time:
Function to encode the image
def encode_image(image_path):
with open(image_path, ârbâ) as image_file:
return base64.b64encode(image_file.read()).decode(âutf-8â)
Path to your image
image_path = âc:\Users\diego\OneDrive - ITESO\ITESO\Verano 2024\Analisis de Regresion\Images\Image1.pngâ
Getting the base64 string
base64_image = encode_image(image_path)
headers = {
âContent-Typeâ: âapplication/jsonâ,
âAuthorizationâ: f"Bearer {api_key}"
}
payload = {
âmodelâ: âgpt-4oâ,
âmessagesâ: [
{
âroleâ: âuserâ,
âcontentâ: [
{
âtypeâ: âtextâ,
âtextâ: âInterpreta los p-values , para saber si rechazamos o no la hipotesis nula de ANOVA. Algo de que alpha1>alpha2>alpha3â
},
{
âtypeâ: âimage_urlâ,
âimage_urlâ: {
âurlâ: f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
âmax_tokensâ: 4096
}
response = requests.post(âhttps://api.openai.com/v1/chat/completionsâ, headers=headers, json=payload)
Here is a recent example from the OpenAI cookbook to compare two images:
⌠we directly embed images for similarity search, bypassing the lossy process of text captioning, to boost retrieval accuracy.
Using CLIP-based embeddings further allows fine-tuning with specific data or updating with unseen images.
One image at a time is only a limit of your code. You can parallel batch process everything up to limits per minute for transactions allowing you to process x fps. This goes for almost all inputs even with other tools allowing you to get all results at same time to compare.
I am looking forward for a solution in which two images of the same equipment or location (before and after pictures) to be analyzed for whether the equipment (like a pump or generator) or location (like swimming pool) is cleaner in the after picture compared to the before picture after service is done. Sufficient training images can be provided. Please let me know if there is a solution available.
Hi!
Please create a new topic for this question!
I am also interested in a possible solution.