It saves you very little. Lets say that you have a pass of top-k 100. And you look at what it saves on a second query: Then you’d have an additional layer of comparisons to do against the index numbers or chunk numbers already extracted - adding computation to the remaining corpus of all-k==10000 instead of reducing it.
Instead if doing multiple search queries for the same purpose and instead of being dismayed at the same results, you can have a consolidation stage where you don’t just de-duplicate: you turn that duplication into a higher weight.
Architecturally, if you are I/O-limited, you can do the pre-refinement exhaustive searches on reduced bit depth or lower dimensionality storage that you have in parallel to the full-quality vector database.
Or if you are time-limited and don’t want to sit around for more rewriting, you’ve done search on your parallel chunk size to weight by overlaps…
New name: HyDE-HyDE-HyDE-HO
- multiple hypothetical document embeddings - hypothetical optimization