Query embedding threshold evaluation with curbing dimension

Welcome to the community!

Yeah, ADA had a cosine range of around 1 to 0.6 or so IIRC, so a threshold of 0.75 seemed sensible for that at the time.

The new models are capable of 1 to 0, so if you use a threshold approach you will definitely need to adjust it.

I use a different ranking mechanism so I can’t give you any advice on which threshold to pick, but I think as low as 0.3 or 0.2 can potentially still contain relevant results depending on what you’re doing.

It really depends on how many vectors you have in your db, and what kind of accuracy you need. I always like to reference this post: It looks like 'text-embedding-3' embeddings are truncated/scaled versions from higher dim version - #14 by LinqLover, but if you’re really moving volume you’ll need to decide based on your content and your downstream pipeline.

1 Like