RAG is failing when the number of documents increase

Just a follow-up. In addition to implementing my own Semantic Chunking strategy: Using gpt-4 API to Semantically Chunk Documents - #166 by SomebodySysop as well as Small-to-Big chunk retrieval (for better chunk context): Advanced RAG 01: Small-to-Big Retrieval | by Sophia Yang, Ph.D. | Towards Data Science

I also deployed your “Deep Dive” strategy. Essentially, I take the top 50 (or even 100) cosine similarity search results, and rate each chunk based upon it’s relationship to the actual question asked. I do one chunk at a time, which ensures the best model response. I then return the highest rated chunks together as context to the model for a complete answer.

Using OpenAI’s new text-embedding-3-large embed model.

Not only is this process faster than I thought it was going to be (since each API call only returns a single rating number, in my case 0-10), but also far less expensive than I imagined (especially with the new gpt-4o-mini and gemini-1.5-flash models).

This works amazingly well. I actually thought I had conceived the concept of “Deep Dive”, but looks like you beat me to it! Anyway, thank you so much for this contribution. it turned out to be the key to my issues with getting “comprehensive” responses.

3 Likes