Yes, I’m aware of how semantic embeddings work, and also of HyDE. My main point is that transforming a user’s query (especially as a starting point without even trying out the database) seems like a bad idea. It adds a potentially unnecessary layer of complexity that makes it harder to understand what’s going on behind the scenes. It also can “dilute” the meaning behind the user’s query.
It also relies on the LLM to answer the question in such as way that the answer aligns the document results better than the question. So, besides shifting and changing the prompt there really isn’t more room for improvement. If you find that HyDE is failing, what’s next?
It definitely has it’s usefulness, don’t get me wrong. Which is why I said “don’t go with it initially”.
For example, I have a very nuanced database that needs to return precise information. I heavily rely on keywords because of this (but also require semantic embeddings to understand the question). In this database are misspelled words as product names and dimensions that can drastically alter the results. This is a clear-cut case where using an LLM to alter the user’s query would be a terrible decision.
If it sounds like I’m repeating myself. I know. You didn’t address anything I said and instead went on a tangent.