RAG is failing when the number of documents increase

That’s what I call “vector dilution”: when your embedded chunks contain more than one precise idea (what I call atomic idea, because it doesn’t make sense to break it further down into smaller pieces) their space position is kind of “blurry” or “diluted” across the ideas it contains.

So when you run the retrieval query, which in most cases contains only one idea in it and is more precise, it is hard to get the high similarly match with “diluted” vector of the chunks you’re looking for.

When your corpus is small it is not easy to diagnose. But when the amount of data grows, you end up with tons of chunks that are spread all over the place in your rankings and if your cap on how many chunks you should include into the prompt is small (which it should be), you miss the important info.

Ideally, your chunks should contain only one idea at a time and a mean to trace them to their source to be able to pull more context into the prompt if needed (because atomic idea chunks often are not enough to answer complex questions). So your retrieval becomes multi-step: find chunks that match the query, pull more context from the chunk sources references, build the prompt - and only then answer.

But it’s easier to say than do (unless you have a robust chunking tool, see some of my previous messages on the subject).

But you can still work even in this situation.

Here are some ideas:

Don’t cut the number of results based on similarity to the query, but rather on their “usefulness”: increase the number of chunks you pull out. In order to make them fit into your prompt, instead of pushing all results there- select those that either contain the answer or the additional information that helps to improve the answer. This way you’ll trim your results in an extra step but you’ll improve the quality of the answer.
Once the results are pulled out, run them in parallel against a model trained to evaluate their “usefulness”. Then select only the ones that passed the test (again, I don’t have the link handy, but you can search in my messages on the forum to get more details).

But having little info about your app and the data structure schema, it’s hard to come up with something truly useful.

1 Like