Using new embedding models

hey everyone,

So we are currently using supabase as our vector db, all our embeddings were generated using text-embedding-ada-002. Now with the release of the new/improved embedding model text-embedding-3-large, do we need to regenerate the embeddings for our data using this model or is there a way to reuse/convert the existing embeddings we have generated with text-embedding-ada-002 ?

thanks

2 Likes

It seems you can do some sneaky math if you want to request 3072 dimensions of 3-large, and convert that to have simultaneous 3k or faster 1k search available, from performing just one embeddings run.

1 Like

any Python example for using embedding-3-large models with dimension reduced?

There’s only an extra parameter to add. dimensions.

Here’s code to request with a smaller “dimensions” variable when you specify it correctly.

supported_models = {
    "ada-002": [1536],
    "3-small": [512, 1536],
    "3-large": [256, 1024, 3072]
}

params = {
    "model": f"text-embedding-{model}",
    "input": embedding_input,
    "encoding_format": "base64"
}

if model in supported_models and dimensions in supported_models[model]:
    if dimensions != max(supported_models[model]):
        params["dimensions"] = dimensions

What to do with (**params) or the base64 return of 32 bit raw floats I’ll leave as an exercise to the reader.

from mteb import MTEB
import openai


model = model="text-embedding-3-large

evaluation = MTEB(tasks=["CQADupstackPhysicsRetrieval"])
results = evaluation.run(model, output_folder=f"results_openai/{model_name}")

i wanted to reproduct the MTEB reults so ,im lookng for python script which can load the openai embedding models directly. is there any way to do that?

Just added Embedding Manager to FlexiAI framework, in 24h - 48h I will add image manager and make the build on PyPI.
In docs you can see:

  • example of creating embeddings
  • example of finding similar texts
  • example of clustering texts
  • example of implementing semantic search
  • example of classifying texts
  • example of answering questions with context
  • example of sentiment classification with logistic regression
  • example of sentiment classification with logistic regression and data augmentation

See if this help you: