Seeking help from the experts about improvement of GraphRAG Drift Search!

1372417922 · April 16, 2025, 2:26am

While studying the Drift Search mechanism in GraphRAG, I observed a potential efficiency issue related to entity redundancy. Here’s my analysis:

1. Redundancy in Sub-queries (in drift search):

When configuring the topK parameter and search depth, sub-queries often retrieve overlapping entities from the knowledge graph (KG), leading to redundant results. For instance, if Entity A is already extracted in an initial query, subsequent sub-queries might re-extract Entity A instead of prioritizing new candidates. Would enforcing a deduplication mechanism—where previously retrieved entities are excluded from future sub-queries—improve both efficiency and result diversity?

1. Missed KG Information:

Despite Drift Search achieving 89% accuracy in my benchmark (surpassing global/local search), critical entities are occasionally omitted due to redundant sub-query patterns. Could iterative refinement strategies (e.g., dynamically adjusting topK based on query context or introducing entity “exclusion lists”) help mitigate this issue while maintaining computational efficiency?

Context:

My goal is to enhance Drift Search’s coverage of underrepresented entities in the KG without sacrificing its latency advantages. Current hypotheses suggest that redundancy control and adaptive depth allocation might address these gaps. I’m not sure I’m on the right track? I could really use your help!!!

_j · April 16, 2025, 8:08am

It saves you very little. Lets say that you have a pass of top-k 100. And you look at what it saves on a second query: Then you’d have an additional layer of comparisons to do against the index numbers or chunk numbers already extracted - adding computation to the remaining corpus of all-k==10000 instead of reducing it.

Instead if doing multiple search queries for the same purpose and instead of being dismayed at the same results, you can have a consolidation stage where you don’t just de-duplicate: you turn that duplication into a higher weight.

Architecturally, if you are I/O-limited, you can do the pre-refinement exhaustive searches on reduced bit depth or lower dimensionality storage that you have in parallel to the full-quality vector database.

Or if you are time-limited and don’t want to sit around for more rewriting, you’ve done search on your parallel chunk size to weight by overlaps…

New name: HyDE-HyDE-HyDE-HO - multiple hypothetical document embeddings - hypothetical optimization

Topic		Replies	Views
How can RAG systems be improved for more complex queries API	3	3695	October 31, 2023
Strategy to Train Model using R.A.G API training	5	2618	October 18, 2023
Embedding - text length vs accuracy? API	13	16499	December 25, 2023
Example incorporation into query formulation API	14	1384	December 16, 2023
How to perform Search using models fine-tuned on technical domains? API	13	2050	March 22, 2022

Seeking help from the experts about improvement of GraphRAG Drift Search!

Related topics