Using new embedding models

hey everyone,

So we are currently using supabase as our vector db, all our embeddings were generated using text-embedding-ada-002. Now with the release of the new/improved embedding model text-embedding-3-large, do we need to regenerate the embeddings for our data using this model or is there a way to reuse/convert the existing embeddings we have generated with text-embedding-ada-002 ?

thanks

1 Like

It seems you can do some sneaky math if you want to request 3072 dimensions of 3-large, and convert that to have simultaneous 3k or faster 1k search available, from performing just one embeddings run.

1 Like

any Python example for using embedding-3-large models with dimension reduced?

There’s only an extra parameter to add. dimensions.

Here’s code to request with a smaller “dimensions” variable when you specify it correctly.

supported_models = {
    "ada-002": [1536],
    "3-small": [512, 1536],
    "3-large": [256, 1024, 3072]
}

params = {
    "model": f"text-embedding-{model}",
    "input": embedding_input,
    "encoding_format": "base64"
}

if model in supported_models and dimensions in supported_models[model]:
    if dimensions != max(supported_models[model]):
        params["dimensions"] = dimensions

What to do with (**params) or the base64 return of 32 bit raw floats I’ll leave as an exercise to the reader.

from mteb import MTEB
import openai


model = model="text-embedding-3-large

evaluation = MTEB(tasks=["CQADupstackPhysicsRetrieval"])
results = evaluation.run(model, output_folder=f"results_openai/{model_name}")

i wanted to reproduct the MTEB reults so ,im lookng for python script which can load the openai embedding models directly. is there any way to do that?