Results given for Irrelevant Queries that is not in the data

I have used Openai Embeddings for creating the Embeddings when I do vector search question irrelevant to the data it gave the results even though the it it irrelevant to the data.How to rectify the issue without using the threshold

You do not.

Embeddings use cosine similarity as a proxy for relevance between two vectors.

Two entirely unrelated vectors will still have a a cosine similarity score. So, the embeddings returned are just the best of only bad options. The only way to ensure the model doesn’t pull in irrelevant data is to establish a threshold for relevance.

1 Like