Get embeddings for images

kindOf · November 24, 2023, 11:14am

Hi there,

is there a way to get the embeddings for images via the API?
I would like to store them in my vectordb but don’t want to mix embedding calculation, if not possible i have to calculate own embeddings for text and images.

Kind regards

kindof

vb · November 24, 2023, 12:05pm

Hi!
Welcome to the community!
OpenAI’s text embeddings measure the relatedness of text strings.
If you want to create embeddings for images you need to use another model. You can check huggingface as reference. Here is a link to get you started:

kindOf · November 26, 2023, 12:36pm

Hi vb,

was hoping to avoid this but thanks for the HF link!

Kind regards

kindOf

oleksandr_gamaniuk · March 26, 2024, 5:32pm

You can look for image embeddings models at Replicate. Here’s one of them https://replicate.com/daanelson/imagebind. We use it to detect deep fakes at kazimir.ai.

curt.kennedy · March 26, 2024, 10:17pm

For image embeddings, I am using Titan Multimodal Embeddings Generation 1, available via API in AWS.

It’s working good for me so far at classifying images, by correlating to previously labeled images, and determining the best fit label for the image.

You can also mix text and the image together (mutimodal), but I am using it without text to get a raw image embed.

As it stands, there is no direct image embedding model from OpenAI. The closest you can get is use GPT-4V to generate a text description of the image, and then embed the text. But this is too much compression for my use case, and not cheap either.

kingsframe · April 13, 2024, 7:21pm

Does the image embeddings work well along side text embeddings? A common use case is to do RAG retrieval on documentation with screenshots in it.

Very often the screenshots contain critical information lost for text embedding only. How good is the image embedding if user were query relevant information in the screenshot. Say UI configurations?

GPT-4V sounds like promising workaround but I haven’t tried it performance yet for RAG applications

shreyasjha5706 · April 27, 2024, 5:24pm

Maybe query GPT4 vision to describe the image in as much detail as it can, and then use that text to create an “embedding” of the caption for that image. And whenever you use RAG, if the embedding of the image pops up, substitute it for the image and send it along with the response.

hzburki · February 12, 2025, 7:03am

Did you get an answer for this? @kingsframe

hzburki · February 12, 2025, 7:05am

I have the same idea. Use GPT4 to get description with visual, cultural, contextual and semantic meanings (if possible) and feed that to the embedding API.

Then do the same for query strings and match embeddings in a vector database to pull best results.

Topic		Replies	Views
Does openAI provide API that takes Embeddings as an input? API embeddings	10	4052	December 18, 2023
Knowledge Retrieval: support for PDF images Feedback knowledge-files	9	2051	October 28, 2024
How can I optimize the data I am embedding to increase vector search result quality? API embeddings , api , gpt-4o-mini	2	188	August 31, 2024
RAG for visual content in GPT4 vision Community gpt-4	1	2611	November 8, 2023
Simple text embedding or CLIP for RAG? API embeddings	3	729	May 8, 2024

Get embeddings for images

Related topics