Using new embedding models

dsdev · January 26, 2024, 9:44am

hey everyone,

So we are currently using supabase as our vector db, all our embeddings were generated using text-embedding-ada-002. Now with the release of the new/improved embedding model text-embedding-3-large, do we need to regenerate the embeddings for our data using this model or is there a way to reuse/convert the existing embeddings we have generated with text-embedding-ada-002 ?

thanks

_j · January 26, 2024, 9:49am

It seems you can do some sneaky math if you want to request 3072 dimensions of 3-large, and convert that to have simultaneous 3k or faster 1k search available, from performing just one embeddings run.

aksdesai1998 · January 27, 2024, 10:54am

any Python example for using embedding-3-large models with dimension reduced?

_j · January 27, 2024, 11:13am

There’s only an extra parameter to add. dimensions.

Here’s code to request with a smaller “dimensions” variable when you specify it correctly.

supported_models = {
    "ada-002": [1536],
    "3-small": [512, 1536],
    "3-large": [256, 1024, 3072]
}

params = {
    "model": f"text-embedding-{model}",
    "input": embedding_input,
    "encoding_format": "base64"
}

if model in supported_models and dimensions in supported_models[model]:
    if dimensions != max(supported_models[model]):
        params["dimensions"] = dimensions

What to do with (**params) or the base64 return of 32 bit raw floats I’ll leave as an exercise to the reader.

aksdesai1998 · January 27, 2024, 11:30am

from mteb import MTEB
import openai


model = model="text-embedding-3-large

evaluation = MTEB(tasks=["CQADupstackPhysicsRetrieval"])
results = evaluation.run(model, output_folder=f"results_openai/{model_name}")

i wanted to reproduct the MTEB reults so ,im lookng for python script which can load the openai embedding models directly. is there any way to do that?

razvan.i.savin · July 26, 2024, 10:19pm

Just added Embedding Manager to FlexiAI framework, in 24h - 48h I will add image manager and make the build on PyPI.
In docs you can see:

example of creating embeddings
example of finding similar texts
example of clustering texts
example of implementing semantic search
example of classifying texts
example of answering questions with context
example of sentiment classification with logistic regression
example of sentiment classification with logistic regression and data augmentation

See if this help you:

github.com

SavinRazvan/flexiai/blob/main/examples/Code examples/embeddings_manager.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Current Directory: /home/razvansavin/Proiecte/flexiai/examples/Code examples\n",
      "Changed Directory to: /home/razvansavin/Proiecte/flexiai\n",
      "Project root added to sys.path\n"
     ]
    }
   ],
   "source": [
    "import sys\n",
    "import os\n",

This file has been truncated. show original

Topic		Replies	Views
New embedding model mapping with old ada002 possible? API embeddings , api , in-the-news	2	647	January 30, 2024
How to deal with different vector-dimensions for embeddings and search with pgvector? Community embeddings	3	16070	March 20, 2024
Transitioning to the new embeddings models from ada API embeddings	8	5896	January 27, 2024
Should we update to the new embeddings models? API api	4	1276	February 10, 2024
Are OpenAI text-embedding-ada-002 embedding model greater than text-embedding-3-large? Community embeddings , chatgpt , api	1	1852	February 21, 2024

Using new embedding models

Related topics