While studying the Drift Search mechanism in GraphRAG, I observed a potential efficiency issue related to entity redundancy. Here’s my analysis:
-
- Redundancy in Sub-queries (in drift search):
When configuring the topK
parameter and search depth, sub-queries often retrieve overlapping entities from the knowledge graph (KG), leading to redundant results. For instance, if Entity A is already extracted in an initial query, subsequent sub-queries might re-extract Entity A instead of prioritizing new candidates. Would enforcing a deduplication mechanism—where previously retrieved entities are excluded from future sub-queries—improve both efficiency and result diversity?
-
- Missed KG Information:
Despite Drift Search achieving 89% accuracy in my benchmark (surpassing global/local search), critical entities are occasionally omitted due to redundant sub-query patterns. Could iterative refinement strategies (e.g., dynamically adjusting topK
based on query context or introducing entity “exclusion lists”) help mitigate this issue while maintaining computational efficiency?
Context:
My goal is to enhance Drift Search’s coverage of underrepresented entities in the KG without sacrificing its latency advantages. Current hypotheses suggest that redundancy control and adaptive depth allocation might address these gaps. I’m not sure I’m on the right track? I could really use your help!!!