Hi,

I’m currently using OpenAI embeddings to index some texts and was tinkering with OpenAI CLIP which would let me use image in addition.

Questions:

  • Does it make sense to average OpenAI embeddings with OpenAI CLIP embeddings? Will semantic search performance be degraded / improved?

The bigger context is that I use postgres to index my vectors and there is a possibility that I use multiple models to embed my content but would prefer if I have to deal with a single column for the vector. (example: index content with clip, openai embeddings, sentence bert, etc)

This is my table

create table documents (
    id text primary key,
    data text,
    embedding vector (1536),
    hash text,
    dataset_id text,
    user_id text,
    metadata json
);

create index on documents
using ivfflat (embedding vector_cosine_ops)
with (lists = 100);

Unsure I can use json for embeddings (could be {“clip”: […], “openai_embeddings”: […]…}), need to index dynamically or something

Well thats a bit messy thought, just if you could help with this question:

  • Does it make sense to average OpenAI embeddings with OpenAI CLIP embeddings? Will semantic search performance be degraded / improved?