Rule of thumb cosine similarity thresholds?

I have used 0.79 as the cosine similarity threshold for text-embedding-ada-002. This means that any lower value would not be considered similar enough to be included in the context.

However, upon utilizing text-embedding-3-large, the same threshold no longer seems effective. Initial tests indicate that a lower threshold number should be chosen.

I’m curious to learn about the rule of thumb for the similarity threshold that people have settled on with text-embedding-3-large.

1 Like

It’s all relative.

You’ll need to see how your text clusters with the new embeddings.

The general observation has been that similarities are much lower across the board.

1 Like

Actually your answer was already in my question :slight_smile:

That’s because there isn’t a definitive answer to your question.

Hi!

tldr: 0.3

Yes, using rule-of-thumb values to start one’s own exploration is a good way forward for developers who are mostly interested in practical outcomes.

You can take a look at what other users from the community have found to work well for them. And from there you need to fine-tune your approach to your use case.

Hope this helps.

1 Like