Asking follow-up questions with retrieval

I’m supporting on a simple Q&A bot project using gpt3.5 and retrieval with pinecone. We’re looking for a standardized way to have it query the vector database for follow-up questions and return relevant info even if the user says something like “Give me some examples”.

It seems partly resolved by having GPT spin up a question based on the context of the last turn and the current user question.

But I’m wondering how others do this. And if there are any examples of existing projects, youtube videos, or web docs on the subject that I can pass on.

I’ve found HyDE, which seems like the way to go for me, but I’m not the one who decides.

Thanks for any help on this.

1 Like

It’s somewhat of a new topic when we are talking of the LLM asking questions back to the person.

But given your specific example, where the user says “Give me some examples”, my first thought is detect the interrogative, then provide the previous response from the LLM, and query your data, assuming it has the answer with both dense and sparse vectors, and return the top reciprocal ranked fusion results, feed this into the prompt, and insert “Give me some examples” back into the user field and see what the output looks like.

If your answer is in the retrieved data through dense and sparse (hybridized) then presenting this wall of info to the LLM is the best shot you’ve got. I believe pinecone supports dense and sparse hybridized retrieval, if not weaviate does. Me, I just spin my own version of both, so can’t help you. :sunglasses:

I haven’t researched HyDE, but all it seems to do is “impedance match” one thing to another (response to question to embedding). But curious as to why you think it’s the way to go, maybe I need to look into HyDE.

1 Like

The idea behind HyDE is that the structure of a piece of text is very relevant to the embedding.

So, with a toy example,

If you have an embedding of “The ball is blue.” which would want to retrieve, the embedding of the text “The ball is red.” will be more similar than embedding the question “What color is the ball?”

I would need to test this toy example to verify, but that’s the basic idea behind HyDE.

You construct a “hypothetical” document as an answer to a prompt and the embedding of that hypothetical document will be a closer to the embedding of the document that you want to retrieve than the prompt itself would be—it doesn’t matter if the facts are wrong.

I’ve experimented with HyDE a bit and I’ve found it to be hit or miss. In situations where it hits though, it’s very good.


Right, that’s my understanding too. It basically formulates a hypothetical “close” thing, that you can then retrieve with better precision in your own data, and then that information is brought to the LLM to filter out the response.

You mention it is “hit or miss” and I can believe it, since it is formulating outside the space of the users data, in this case @wswitzer.

But … if there is any slight chance, that the dense (embeddings) and sparse (keywords) contains the information, directly in the users database, I would think that would be better, right? Otherwise, this means the database doesn’t contain the answer.

SO, I guess, cut out the middleman, go direct to your data without proxy. The belief here is that often we are in a keyword starved domain if we only consider dense “meaning” vectors.

It’s a philosophical statement, I know. It’s just a hunch. Which is why I think “proxy correlations” [-HyDE-] might smooth things over, and are interesting, but are hit or miss because they operate outside the domain of the users data. If the users’ data is entirely focused on, and the answer does not appear, it’s either the users fault, or some prompt engineering … both of which can be worked on.



I think the hit-or-miss aspect of my experience might have more to do with the quality/quantity of my embeddings when I was toying with HyDE than anything else.

I’m working on a big embedding project now (about 30,000 math-heavy scientific papers), but I need to solve some (many) data cleaning issues before I really get into it.

Once I do, I’ll re-visit HyDE in earnest and report back.

My current plan is to try an iterative HyDE approach where starting from maybe a naive keyword search load a bunch of document parts into context, then generate a hypothetical response to pull (hopefully) better document parts into context for a revised response.


In general though I am cautiously optimistic on HyDE.


I consider part of an advanced conversation management system to be to give the conversation a threading conversation pattern and index, that relies on language AI calls to determine when a new shift in the topic has taken place.

In this way, you can not only ensure all of the current topic’s conversation turns are reproduced (like say for example I want all six Q&A about writing HTML, but automatically want the manager to disregard prior javascript chat that happened before it recognized a tonal shift of the conversation).

This also allows you to have a group of user questions that are all of the same line of questioning, and then all like questions can be an embedding that is similarity-matched to your database.


Here is the results of a “toy” HyDE example,

                        The ball is blue.
The ball is blue.                  1.0000
The ball is yellow.                0.9384
The ball is purple.                0.9365
The ball is green.                 0.9356
The ball is white.                 0.9324
The ball is pink.                  0.9279
The ball is red.                   0.9269
What color is the ball?            0.9217
The ball is black.                 0.9214
The ball is orange.                0.9183

Now, this is very much a toy example. But, here we can see that almost all color “guesses” do better than the simple question.

I expanded on this test quite a bit here:

If we look at how the question performs across a greater array of colors, the answer gets a bit murkier. When we look at the distribution of where the question ranks among all guessed colors for each ground-truth color, we see the question is essentially no better or worse than any random guess, the question embedding beats a randomly guessed color about half the time.

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.08392 0.41958 0.50350 0.49259 0.58042 0.80420

That said, there are a lot of weird colors in there (e.g. papaya whip and bisque)

If we look at the median similarity for each guess among all the colors we can see that the colors we might expect GPT-4 to “guess” tend to do a bit better than the question.

Big Table
HyDE Median Similarity
The ball is light blue. 0.9141
The ball is dark blue. 0.9140
The ball is violet. 0.9102
The ball is pale turquoise. 0.9098
The ball is light sky blue. 0.9088
The ball is dark green. 0.9084
The ball is blue. 0.9083
The ball is dark gray. 0.9081
The ball is dark grey. 0.9081
The ball is dark cyan. 0.9079
The ball is light gray. 0.9078
The ball is dark red. 0.9075
The ball is light green. 0.9073
The ball is grey. 0.9073
The ball is gray. 0.9073
The ball is lavender. 0.9069
The ball is light grey. 0.9065
The ball is sky blue. 0.9065
The ball is dark turquoise. 0.9064
The ball is light yellow. 0.9062
The ball is beige. 0.9062
The ball is pale green. 0.9058
The ball is pink. 0.9050
The ball is cyan. 0.9049
The ball is yellow. 0.9044
The ball is light cyan. 0.9042
The ball is brown. 0.9039
The ball is medium blue. 0.9039
The ball is blue violet. 0.9037
The ball is light pink. 0.9033
The ball is yellow green. 0.9032
The ball is dark orange. 0.9029
The ball is pale violet red. 0.9028
The ball is navy blue. 0.9026
The ball is white. 0.9026
The ball is dim gray. 0.9025
The ball is violet red. 0.9024
The ball is khaki. 0.9022
The ball is green. 0.9022
The ball is dark sea green. 0.9021
The ball is rosy brown. 0.9020
The ball is purple. 0.9019
The ball is dark violet. 0.9018
The ball is sea green. 0.9018
The ball is dim grey. 0.9015
The ball is light sea green. 0.9013
The ball is green yellow. 0.9009
The ball is turquoise. 0.9009
The ball is lime green. 0.9000
The ball is navy. 0.8999
The ball is tan. 0.8994
The ball is black. 0.8993
The ball is midnight blue. 0.8992
The ball is dark salmon. 0.8988
The ball is maroon. 0.8987
The ball is dark slate blue. 0.8984
The ball is deep pink. 0.8982
The ball is royal blue. 0.8980
The ball is powder blue. 0.8977
The ball is light slate blue. 0.8974
The ball is dark olive green. 0.8973
The ball is orange. 0.8972
The ball is light goldenrod. 0.8970
The ball is azure. 0.8969
The ball is red. 0.8967
The ball is magenta. 0.8960
The ball is light steel blue. 0.8960
The ball is medium purple. 0.8958
The ball is slate blue. 0.8954
The ball is steel blue. 0.8950
The ball is sandy brown. 0.8947
The ball is light slate grey. 0.8947
The ball is light goldenrod yellow. 0.8946
The ball is light salmon. 0.8945
The ball is light slate gray. 0.8943
The ball is orange red. 0.8942
What color is the ball? 0.8941
The ball is dark slate gray. 0.8936
The ball is slate grey. 0.8934
The ball is light coral. 0.8934
The ball is medium sea green. 0.8932
The ball is deep sky blue. 0.8932
The ball is slate gray. 0.8929
The ball is hot pink. 0.8923
The ball is dark goldenrod. 0.8922
The ball is pale goldenrod. 0.8918
The ball is dark slate grey. 0.8916
The ball is dark orchid. 0.8916
The ball is cornflower blue. 0.8911
The ball is coral. 0.8908
The ball is dark magenta. 0.8906
The ball is forest green. 0.8900
The ball is chartreuse. 0.8897
The ball is medium violet red. 0.8892
The ball is floral white. 0.8890
The ball is dodger blue. 0.8888
The ball is spring green. 0.8885
The ball is olive drab. 0.8884
The ball is dark khaki. 0.8883
The ball is alice blue. 0.8877
The ball is aquamarine. 0.8872
The ball is lawn green. 0.8869
The ball is gold. 0.8867
The ball is medium spring green. 0.8867
The ball is cadet blue. 0.8862
The ball is misty rose. 0.8855
The ball is lavender blush. 0.8855
The ball is ghost white. 0.8851
The ball is medium slate blue. 0.8851
The ball is sienna. 0.8849
The ball is salmon. 0.8849
The ball is plum. 0.8840
The ball is cornsilk. 0.8839
The ball is medium turquoise. 0.8838
The ball is orchid. 0.8836
The ball is chocolate. 0.8834
The ball is ivory. 0.8824
The ball is antique white. 0.8805
The ball is linen. 0.8793
The ball is goldenrod. 0.8787
The ball is seashell. 0.8780
The ball is tomato. 0.8778
The ball is mint cream. 0.8771
The ball is gainsboro. 0.8765
The ball is saddle brown. 0.8762
The ball is wheat. 0.8758
The ball is moccasin. 0.8751
The ball is white smoke. 0.8751
The ball is peach puff. 0.8742
The ball is medium orchid. 0.8738
The ball is indian red. 0.8737
The ball is thistle. 0.8727
The ball is snow. 0.8724
The ball is blanched almond. 0.8722
The ball is old lace. 0.8721
The ball is lemon chiffon. 0.8662
The ball is burly wood. 0.8644
The ball is medium aquamarine. 0.8642
The ball is firebrick. 0.8636
The ball is navajo white. 0.8588
The ball is honeydew. 0.8578
The ball is bisque. 0.8576
The ball is peru. 0.8576
The ball is papaya whip. 0.8459

Now, again, this is a toy example. To show just how important the structure of an embedded text is, even something like “The dog is old.” has an average similarity against the ground-truth colors of about 0.78. When the embedded documents are more complex, I will expect a hypothetical document to substantially outperform the base query.

Ultimately though—as is the case with most things—I suspect a combination approach will be the most effective. Combining some keyword filtering to target a HyDE-based similarity search to those documents most likely to contain relevant information then, perhaps, assessing each individually to determine if it is relevant to the question and tossing it if it is not, ingesting the remaining documents, and generating another response. Then, finally, as a last resort iterating this process as necessary.

One of the other things I am planning to investigate is “synthetic” embeddings, which I wrote about briefly here: Text Pre-processing for text-embedding-ada-002 - #2 by elmstedt

Basically, take whatever text information you want to embed and re-write it a bunch of times, summarize it, expand it, explain it to a five-year-old, etc. The idea being it will be much easier to find the appropriate information if it is expressed in many different ways in the embedded space.

Again, once I am to the point in my project of actually embedding things, I can do a full work-up on the differences I experience between using and not using HyDE and all the other things I want to try.


I keep all the previous questions in the conversation, and the embeddings for each of those.
I then weight each previous question with value 0.2, and the current question with 1.0, and sum those embeddings together. Then I re-normalize to magnitude 1 again.
Then I re-run the retrieval for each iteration.

This seems to work surprisingly well for my use case. The value 0.2 was experimentally derived – I initially thought it should be lower, but most questions are in the same general area, so they tend to be close in embedding space anyway, and a high enough weight can compensate for the last question (with 1.0 weighting) being something like “tell me more”