It’s somewhat of a new topic when we are talking of the LLM asking questions back to the person.
But given your specific example, where the user says “Give me some examples”, my first thought is detect the interrogative, then provide the previous response from the LLM, and query your data, assuming it has the answer with both dense and sparse vectors, and return the top reciprocal ranked fusion results, feed this into the prompt, and insert “Give me some examples” back into the user field and see what the output looks like.
If your answer is in the retrieved data through dense and sparse (hybridized) then presenting this wall of info to the LLM is the best shot you’ve got. I believe pinecone supports dense and sparse hybridized retrieval, if not weaviate does. Me, I just spin my own version of both, so can’t help you.
I haven’t researched HyDE, but all it seems to do is “impedance match” one thing to another (response to question to embedding). But curious as to why you think it’s the way to go, maybe I need to look into HyDE.