Reduced Cosine of Similarity relevance scores with "text-embedding-3-small" Vs. "text-embedding-ada-002"

andres12 · July 19, 2024, 6:10pm

I am migrating to the newest embedding models. We have been using “text-embedding-ada-002” in our vector database and we found that when using “text-embedding-3-small” (for both new queries and exisiting database embeddings, which we regenerated with the new model) the cosine of similarity goes much lower compared to ada. For example, the same search query and the same documents generated by ada could provide relevance of .70 or above and now we get relevance scores around 0.4. Any idea on why that could happen?

kevin.dragan · July 19, 2024, 6:36pm

Just spitballing BS here but if they are different dimensionality models, couldn’t this be the reason? Cosine similarity isn’t a scaled approach like dot product, so you just need to mentally recalibrate your levels for similarity in this model and go with it? PS. Just learned the difference on approaches yesterday so take that for what it’s worth.

andres12 · July 19, 2024, 6:52pm

No worries, Kevin, thanks for replying. I’m also quite new experimenting myself with embeddings.
I guess I was expecting to get very different vectors, if not much related to the dimensions of the embeddings, at least because the way each model map text into different embedding spaces. However, I was expecting a similar range of similarity scores. I wonder if “text-embedding-3” similarity scores are expected to be generally lower in any case give its particular way to compute vectors or if it’s something else that may be related to my particular dataset.

Topic		Replies	Views
Are OpenAI text-embedding-ada-002 embedding model greater than text-embedding-3-large? Community embeddings , chatgpt , api	1	1429	February 21, 2024
Transitioning to the new embeddings models from ada API embeddings	8	5377	January 27, 2024
Cosine distance changing with new embedding models? API	3	808	February 9, 2024
I've gone back to ADA 2, text-embedding-3-small is not working for me API embeddings	14	5872	August 14, 2024
Embeddings and Cosine Similarity API	20	13946	February 25, 2024

Reduced Cosine of Similarity relevance scores with "text-embedding-3-small" Vs. "text-embedding-ada-002"

Related topics