HyDE with hybrid search approaches

lightning_missile · September 23, 2023, 8:06pm

Hello all,

I have a system that do question answering with hybrid search (keyword search and vector search). Now I want to integrate HyDE to see what difference it would make to the generated answers.
Currently, my system works as follows:

User asks a question.
Chat GPT converts the question into search terms.
Perform keyword search on my database with the search terms (elasticsearch in this case).
Perform semantic search on elasticsearch with the search terms.
Use rRF to normalize, combine and rerank documents.
Select all the documents that can fit into a specified token budget, ask the question to chat GPT with the selected documents, and get the answer.

Now, I have trouble integrating HyDE to my overall process after reading some example implementations. In particular, where should I actually perform semantic search with the hypothetical answer? I have 2 approaches but am not sure if this is right.

Should I do it during the semantic search step of my current process? That is, use the search terms on keyword search on the whole database, and then use the hypothetical answer for semantic search on the whole database, and rerank the documents to complete the hybrid search?
Should I do it on the results retrieved by keyword search? I think the downside to this is that the keyword search results are now being prioritized, but the examples I found always do this (I.E. getting results from news API on the openai site).

Can anyone help me? What is the best way to integrate hybrid search with HyDE? Thanks.

curt.kennedy · September 23, 2023, 8:15pm

Maybe do a 3-leg RRF.

Here would be a potent combo:

Normal embedding leg.
Keyword leg.
HyDE leg. So Question —> Hyde Answer —> Correlate with embeddings or Keywords or both —> get additional rankings for RRF depending on how many legs this turns into.

lightning_missile · September 24, 2023, 1:31am

Hi. Yes I am considering this as well, but I fear the overall system might be more inefficient if I do lots of rerankings.

lightning_missile · September 24, 2023, 2:12am

@curt.kennedy regarding the HyDE leg, is there a big difference if I do:

do keyword search.
do semantic search.
RRF.
Correlate the ranked results with the Hyde Answer by reranking again.

vs this one?

do keyword search.
Do semantic search.
Correlate the keyword results with the hyde answer.
Correlate the semantic results with the hyde answer.
rerank using rrf.

curt.kennedy · September 24, 2023, 3:06pm

I wasn’t thinking of correlating previous results with the HyDE answer.

I was thinking you get pure answers from the original text by using HyDE, semantics, and keywords. So this is (3) streams of rankings, and then use RRF to fuse them into one ranking.

The only nuance, is that HyDE could be considered some sort of new query (it produces a synthetic query from the original one). So with this you could do 2-leg RRF, one with semantics on the HyDE generated query, one with keywords on the HyDE generated query.

So putting all this together, you have (4) streams (max) to fuse in RRF.

Semantic on original query
Keywords on originally query
Semantic on HyDE generated query
Keywords on HyDE generated query

All of these can be run in parallel, and have no dependencies between them. So do this, and when the last one finishes, fuse them all to a single ranking using RRF.

This is different than what your are saying above, because I am not correlating results from keyword or semantic with anything from HyDE. I am treating each leg as an independent processing stream, which is good for lowering the latency of the overall search.

LinqLover · June 12, 2024, 2:31pm

I’m curious why RRF is a good idea in combination with HyDE. As far as I understand it, each leg in RRF would have ranks that are a permutation of the first n natural numbers? This would mean that different relevances of the legs would not be preserved but normalized by RRF. If I further understand HyDE correctly, each query is speculative by nature, possibly causing a high variance in the relevances of results for different HyDE queries. Would another method such as averaging/summing relevance scores directly not be more appropriate to consider this variance? Or at least use some weighing/hybrid approach?

Looking forward to improve my understanding.

curt.kennedy · June 12, 2024, 7:43pm

HyDE is formulating a related question/query, and then using this synthesized version, instead of the original question/query. But the two are related. So fusing them with something like RRF makes sense since the retrieved targets are similar, since the original and HyDE generated things are similar.

sergeliatko · June 12, 2024, 7:52pm

I also added an extra step before inclusion of all documents which does the following:

Runs a separate prompt (in parallel) over all found documents to confirm they either contain the answer to the question or related/additional information the answering model will need to answer the question.

A “stupider”/fine-tuned model can be used for this simple operation to reduce the costs. I also use a “binary” / one token response (0/1) for this step.

I find it useful as it trims the found docs to what is really needed to answer the question by a more expensive model and tend to improve the answer prompt/response quality by a lot.

So the only down side of it, it adds 2-5 sec on top of the reply time (but in my use case, I run dozens of questions in parallel, and a couple of seconds in any direction do not matter compared to the benefits)

sergeliatko · June 12, 2024, 8:00pm

Have you though if including a set of text chucks containing the answer as a “calibration vector” along with the query if possible/makes sense in your application?

I my use case, excerpts from legal provisions that are likely to contain the answer are vectorized separately, then the query vector is pushed slightly toward the center of the excerpts vectors to improve the document piece targeting. Works like a charm.

anon10827405 · June 12, 2024, 10:54pm

I’d be careful with HyDE. There’s (IMO) much better methods to transforming the query.

Here’s a nice quick image from an OpenAI presentation with similar insights:

curt.kennedy · June 12, 2024, 11:14pm

Here is the video of that talk:

But HyDE, at least the way I am talking about it, is a form of query expansion … but I will have to check out the video

anon10827405 · June 12, 2024, 11:23pm

Thanks!

The graph-related stuff is around 16:00. The X means they did NOT use it in production while the check being included. So everything was included BESIDES fine-tuning and HyDE.

Also, I never knew that they use a SQL database for structured retrieval. Neat. It’s definitely the requirement (and makes sense). I just never even used Retrieval for this data and instead use Tool calling myself.

curt.kennedy · June 14, 2024, 2:49pm

I recently watched the presentation. It looks like their “query expansion” was taking a list of input questions, then breaking them out separately, and executing the search in parallel, and then fusing it back into context for the model.

In all cases though, they got the 98% without a fine-tune! This is encouraging!

anon10827405 · June 14, 2024, 5:32pm

I loved the emphasize on “fuck fine-tuning” intuitively it makes sense to do, but damn, guess not.

Yet again. This is a general purpose, catch-all RAG. Maybe for a more niche/focused topic something like fine-tuning and HyDE can be beneficial. Big maybe.

Thinking further, it does kind of make sense that a general purpose RAG would work best with a general purpose model

curt.kennedy · June 14, 2024, 5:56pm

The fine-tune makes sense for getting formats right, or tone, or absorbing the essence of the fine-tuned data to make for shorter, less token, prompts.

The SQL they used had more to do with extracting certain figures for the model. But the solution was providing correct context in the prompt. So “tooling” (SQL) was needed there.

It all depends on where the context retrieval is breaking down. That’s why RAG is interesting, it all depends on drilling down to what is going wrong in retrieval, and then solving it.

sergeliatko · June 14, 2024, 9:21pm

Golden words. Also I could add that often a good retrieval means several steps (sometimes may be parallel like searching by one column while doing a parallel search by another field, then combining all together, etc).

Good thing to do is to analyse the process of search: what are we looking for, how to build query, do we need one or several queries, how we run the query, how we select results, how we use the found results to build the prompts etc. My rule of thumb, do it manually and slow way down in thinking trying to grab the process of all decision making and needed criteria, write it down, see if there are exceptions or other scenarios. Then build the complete retrieval workflow and see how that can be implemented.

Since at least a year or so, even a simple “quick search” always ended up as a several steps process in my projects. And only after comparing the results the feeling of hammering the screws goes away

sergeliatko · June 14, 2024, 9:28pm

Often, starting by modeling the retrieval is easier as it helps to understand better the structure of the data you need to store. Even if it is counter-intuitive. When you know exactly what you will be looking for and how you’ll be looking for it, speeds up drawing of the schema of the data you’ll be storing.

Topic		Replies	Views
Fine tuning for use of keyword lists API fine-tuning	15	3098	December 10, 2023
HyDE based semantic search enabled on the OpenAI forum Forum feedback semantic-search	15	9350	September 26, 2023
What's the most accurate? Fine tunning vs Prompt Stuffing Community fine-tuning	13	5127	October 2, 2023
Best architecture for searching historical emails semantically? API	25	5355	August 22, 2024
Prompting with the chat/completions API against a large transcript file API	5	3619	October 4, 2023

HyDE with hybrid search approaches

Related topics