Simple text embedding or CLIP for RAG?

javi.stauffenberg · May 1, 2024, 4:52pm

Hello everyone,
I realise this question has probably already been answered somewhere else here, but unfortunately I can’t find a clear answer…

I’d like to code a small RAG module that finds me the correct row from a data structure. A data point is composed of an image, a description of the image and a few more tags - the ultimate goal being to get the image back for further usage.

The user will perform his search by prompting. Would it be more appropriate to generate embeddings from the description only (the label), or actually from the images with their corresponding labels (using CLIP)?

It’s not exactly clear to me what are the advantages of having image+label embeds if say we’d already have detailed labeling through GPT4Vision for example.

Appreciate any help, have a good day!

javi.stauffenberg · May 8, 2024, 6:15pm

I’m bumping this up, in case someone has an answer to the question .

_j · May 8, 2024, 6:34pm

You want to retrieve an image.

If an image is not a provided input for matching, then it does not make sense to use an image embedding AI.

Embeddings based search would return some “closest”, not “correct”.

javi.stauffenberg · May 8, 2024, 7:06pm

Hey,
Thanks for the reply! If I’m not wrong, CLIP eventually does a semantic translation of the image features, so wouldn’t it be the same as simply getting the closest matching label (provided by GPT4V for example), that is in a same data collection as the image URL to retrieve ?

Topic		Replies	Views
Get embeddings for images API embeddings , gpt-4-vision	8	30791	February 12, 2025
Combining OpenAI Embeddings and OpenAI CLIP embeddings? API	0	1420	March 22, 2023
Multi-modal RAG issue with images Community api , lost-user , assistants-api	5	1397	June 9, 2024
Knowledge Retrieval: support for PDF images Feedback knowledge-files	9	2184	October 28, 2024
Image selection with API - How to achieve high relevancy? API gpt-4 , api	4	219	October 9, 2024

Simple text embedding or CLIP for RAG?

Related topics