Just in case you think no one is paying attention when you provide these very detailed posts:
- Reasoning Behind “Comprehension Level”, “Filter Results” and “Deep Dive” Strategies.
- RAG is failing when the number of documents increase - #24 by sergeliatko
- Ideally, your chunks should contain only one idea at a time and a mean to trace them to their source to be able to pull more context into the prompt if needed (because atomic idea chunks often are not enough to answer complex questions). So your retrieval becomes multi-step: find chunks that match the query, pull more context from the chunk sources references, build the prompt - and only then answer.
- Hierarchal | Semantic Chunking (Using gpt-4 API to Semantically Chunk Documents - #166 by SomebodySysop) is designed to create “atomic idea” chunks.
- Comprehension Level (RAG is not really a solution - #97 by SomebodySysop) retrieves chunks adjacent (within a set radius) to the key chunk (the chunk identified by the query) in order to provide more context to the key chunk.
- This idea was initially inspired by this post: Retrieving “Adjacent” Chunks for Better Context - Support - Weaviate Community Forum
- Don’t cut the number of results based on similarity to the query, but rather on their “usefulness”: increase the number of chunks you pull out. In order to make them fit into your prompt, instead of pushing all results there- select those that either contain the answer or the additional information that helps to improve the answer. This way you’ll trim your results in an extra step but you’ll improve the quality of the answer.
Once the results are pulled out, run them in parallel against a model trained to evaluate their “usefulness”. Then select only the ones that passed the test- Deep Dive (RAG is failing when the number of documents increase - #26 by SomebodySysop) automatically increases the number of results returned. However,
- Filter Results (Using gpt-4 API to Semantically Chunk Documents - #172 by SomebodySysop) filters the returned chunks based upon their relevance to the question.
- The result of these two above are answers that are more comprehensive and which aren’t diluted by unnecessary noise. You may expand your search to 100 or 200 chunks, but each chunk is examined to determine it’s relevance to the actual question. So, in reality, you end up returning only a dozen or less highly relevant chunks to the model, thus dramatically decreasing your prompt size but increasing the quality of your responses.
This combination of tools have been working fabulously. Thank you, again, for your contributions!