How to handle same word in query and nodes but have different meanings in RAG?

For example, I have a RAG and the user asks a query:

  1. what is the long yellow thing monkeys like?

I expect banana, but the retrieved nodes are -

Document: WACKY MONKEY CANDY, Score: 1.0

Document: YELLOW MELON LB, Score: 0.9968768372512178

Document: YELLOW/ RED DATES LB, Score: 0.996724735419526

Document: YELLOW/ RED DATES LB, Score: 0.9966791263391769

Document: CHHEDAS YELLOW BANANA 150GM, Score: 0.996192566724983

Document: YELLOW MELON LB, Score: 0.9961317709478378

How can i handle this?

Depending on the methods used by the RAG you mentioned, if it uses an embedding model to calculate similarity with existing text and ranks them by score, it’s possible that the user’s query does not contain the word “banana,” or that the embedding model may have failed to associate the word “banana” with the user’s query.

While embedding models can capture meaning and calculate similarity to some extent even with words not present in the original text, they are not perfect.

One approach to address this could be to try using different embedding models to see the differences.

I hope this helps in some way.

1 Like