HyDE is a synonym generator, and also a steering engine. People really don’t understand HyDE very well in this context, so let me give an example.
Suppose you have a business (Platcorp) that offers cloud computing products. You have multiple offerings and you want to land in N different offerings.
So here is how you steer it to offering 1 (of N). Your “serverless” product line. Where you steer it in this direction in the System message:
System: Generate the following answer from the perspective of how it can enable the user to develop on our platform.
Our platform is a serverless and database driven solution that scales to infinity with zero lag.
Now drop in the user:
User: What is PHP?
Now run search on this generated chunk, and the other N projections, and the original query (N + 1 total).
You now retrieve your top hits, feed this into the LLM as context, and drop the original user question in again to generate your final answer.
Send this to the LLM, and get response:
Assistant: PHP is a server-side scripting language that you can use on our platform to develop web applications. You can utilize our serverless and database-driven solution to create scalable applications without worrying about server management or latency. We’ve made sure that even when the load of your application increases, our platform scales automatically, providing seamless performance. Essentially, you can focus on the development part while we take care of the maintenance, access, and scalability, thus ensuring smooth, uninterrupted development experience with unlimited growth potential.
Suppose this correlates well in your database with this chunk:
PHP is a popular open-source, server-side scripting language that is widely used in web development for creating dynamic web pages. In the context of Platcorp, PHP is used as a part of its offering to developers. The Platcorp platform provides a serverless architecture and database driven solution that allows developers to create, debug, and deploy PHP applications efficiently. Since the platform manages server infrastructure, developers can concentrate on writing applications in PHP rather than managing systems, which promotes faster development and deployment. The platform is also scalable and can handle significant traffic increases, ensuring a smooth user experience with zero lag.
Then you send another query to the LLM with this System:
Generate an answer from the following context. Mention how it relates to Platcorp if applicable.
CONTEXT:
PHP is a popular open-source, server-side scripting language that is widely used in web development for creating dynamic web pages. In the context of Platcorp, PHP is used as a part of its offering to developers. The Platcorp platform provides a serverless architecture and database driven solution that allows developers to create, debug, and deploy PHP applications efficiently. Since the platform manages server infrastructure, developers can concentrate on writing applications in PHP rather than managing systems, which promotes faster development and deployment. The platform is also scalable and can handle significant traffic increases, ensuring a smooth user experience with zero lag.
And this User:
What is PHP?
To get this final Assistant response, which is the only thing you send back to the user:
PHP is a popular open-source, server-side scripting language that is widely used in web development for creating dynamic web pages. In the context of Platcorp, it is used to enable developers to create, debug, and deploy applications efficiently without having to manage server infrastructure.
So you are steering and correlating to focused objects in your database, and limiting LLM drift and hallucinations that have nothing to do with your intended messaging or offerings.
Like I said above, the context is managed based on the real inputs and outputs in the final LLM call to maintain correct logic and history across time. The hypothetical embeddings (HyDE) are only internal for steering and not shown to the User. So in the final response above, this becomes an “Assistant” message in the official message history array, and all the HyDE stuff is completely hidden to the user.
PS. This is all part of my HyDRA-HyDE RAGamuffin stack I am developing.
