RAG is failing when the number of documents increase

sergeliatko · June 14, 2024, 9:59pm

That’s what I call “vector dilution”: when your embedded chunks contain more than one precise idea (what I call atomic idea, because it doesn’t make sense to break it further down into smaller pieces) their space position is kind of “blurry” or “diluted” across the ideas it contains.

So when you run the retrieval query, which in most cases contains only one idea in it and is more precise, it is hard to get the high similarly match with “diluted” vector of the chunks you’re looking for.

When your corpus is small it is not easy to diagnose. But when the amount of data grows, you end up with tons of chunks that are spread all over the place in your rankings and if your cap on how many chunks you should include into the prompt is small (which it should be), you miss the important info.

Ideally, your chunks should contain only one idea at a time and a mean to trace them to their source to be able to pull more context into the prompt if needed (because atomic idea chunks often are not enough to answer complex questions). So your retrieval becomes multi-step: find chunks that match the query, pull more context from the chunk sources references, build the prompt - and only then answer.

But it’s easier to say than do (unless you have a robust chunking tool, see some of my previous messages on the subject).

But you can still work even in this situation.

Here are some ideas:

Don’t cut the number of results based on similarity to the query, but rather on their “usefulness”: increase the number of chunks you pull out. In order to make them fit into your prompt, instead of pushing all results there- select those that either contain the answer or the additional information that helps to improve the answer. This way you’ll trim your results in an extra step but you’ll improve the quality of the answer.
Once the results are pulled out, run them in parallel against a model trained to evaluate their “usefulness”. Then select only the ones that passed the test (again, I don’t have the link handy, but you can search in my messages on the forum to get more details).

But having little info about your app and the data structure schema, it’s hard to come up with something truly useful.

Topic		Replies	Views
The length of the embedding contents API	48	34490	December 13, 2023
Document Sections: Better rendering of chunks for long documents Prompting vector-db , semantic-search	66	32043	April 1, 2025
Scaling RAG chatbot system to millions of documents API gpt-4 , prompt-engineering , rag	18	6435	February 28, 2024
What's the most accurate? Fine tunning vs Prompt Stuffing Community fine-tuning	13	5159	October 2, 2023
Discussion thread for "Foundational must read GPT/LLM papers" Community gpt-4 , gpt-35-turbo , chatgpt , research	75	10759	September 3, 2024

RAG is failing when the number of documents increase

Related topics