Is retraining for documents required, if change to latest embedding models?

siddhant.saurabh · January 29, 2024, 6:56am

Currently, we have embeddings trained by ADA, stored in our vector db, we are planning to update the embedding model to text-embedding-small-1536.

Do we need to retrain our entire corpus due to a change of the embedding model?

We found that open AI uses the same tokeniser but both models ADA and text-embedding-small-1536 have different architecture.

if we create embedding from one model for the doc chunks and create embedding for another model for query chunks and use cosine similarity then how will cosine similarity play its role?

please assist…

_j · January 29, 2024, 7:01am

ada-002 won’t be going away, so there is no need to migrate away from that model.

However, you will need to re-embed all if you want to use a different model for other reasons. They are not compatible, even though 3-small has the same number of dimensions.

Embeddings will have completely different values and semantics. Use of new models will require new tuning of thresholds you may have been using.

Topic		Replies	Views
New embedding model mapping with old ada002 possible? API embeddings , api , in-the-news	2	662	January 30, 2024
Model deprecation question API	4	663	December 24, 2023
Transitioning to the new embeddings models from ada API embeddings	8	6086	January 27, 2024
Are embeddings tied to a particular model? API embeddings	3	4086	July 4, 2023
Are OpenAI text-embedding-ada-002 embedding model greater than text-embedding-3-large? Community embeddings , chatgpt , api	1	1998	February 21, 2024

Is retraining for documents required, if change to latest embedding models?

Related topics