I am migrating to the newest embedding models. We have been using “text-embedding-ada-002” in our vector database and we found that when using “text-embedding-3-small” (for both new queries and exisiting database embeddings, which we regenerated with the new model) the cosine of similarity goes much lower compared to ada. For example, the same search query and the same documents generated by ada could provide relevance of .70 or above and now we get relevance scores around 0.4. Any idea on why that could happen?
Just spitballing BS here but if they are different dimensionality models, couldn’t this be the reason? Cosine similarity isn’t a scaled approach like dot product, so you just need to mentally recalibrate your levels for similarity in this model and go with it? PS. Just learned the difference on approaches yesterday so take that for what it’s worth.
No worries, Kevin, thanks for replying. I’m also quite new experimenting myself with embeddings.
I guess I was expecting to get very different vectors, if not much related to the dimensions of the embeddings, at least because the way each model map text into different embedding spaces. However, I was expecting a similar range of similarity scores. I wonder if “text-embedding-3” similarity scores are expected to be generally lower in any case give its particular way to compute vectors or if it’s something else that may be related to my particular dataset.