I still don’t understand why, in their architecture, they:
“chat history + new question → submit to gpt to generate new standalone question → vector search with new standalone question”
instead of “chat history + new question → vector search”.
This creates three OpenAI api calls per query (creation of standalone question, vectorization of question and standalone question + context documents).
And, I’m also not clear on just what goes into the chat history? The original questions, context documents submitted, standalone questions created and responses OR just standalone questions and responses?
I explained this idea of the stand-alone question here. Glad to see that it’s being used in other places!
You need to “de-contextualize” the question in a conversational context to do a proper semantic search. Imagine the conversation:
User: What is the capital of Spain?
Assistant: It is Madrid.
User: How many people do they live in there?
If you embed the question “How many people do they live in there?” and conduct the semantic search, you won’t retrieve documents that talk specifically about the population of Madrid. This is because it is a “contextual” question: it only makes sense in the context of the on-going conversation. You can solve this by using a module that de-contextualizes the contextual question into a “stand-alone” one. Something like “What is the population of Madrid?”
You can easily achieve this with an additional call to OpenAI. In my case, the “chat history” is only composed by the previous QAs: no documents. You don’t need the supporting documents to reformulate the contextual question into a stand-alone one: previous utterances are enough. In fact, I only send three QAs pairs and that’s usually more than enough to produce the stand-alone question.
Sometimes a conversation of 8 message pairs is required to fully contextualize and answer the question.
Sometimes a conversation of 12 pairs only needs the last message.
Good insights. Obviously, in the answering stage you try to send a huge portion of the on-going conversation (in combination with supporting docs and the contextual question itself). As much info as you can.
But in order of de-contextualizing only, in my experience usually a small number of utterances is more than enough. This is because each one of these interactions already have a lot of context about the current conversational topic. I hardly ever (or never) run into situations where, in order to fully characterize the stand-alone question, I need to go back to more than 5 previous utterances.
Also: you need to explicitly instruct the module to produce the same question if the “contextual” question is already a “stand-alone” one, that can be understood without the previous context.
Anyways, as this “de-contextualizer” only has previous QA as input, you have a lot of tokens to play with . You can perfectly pass a lot of previous utterances if your conversational context needs so.
I have not done this yet. But, I have completed the coding of my chat completion program, and it is working like a charm. I am chatting with my documents, and loving it! It’s really remarkable how little information, as you state, is necessary to maintain the context in chat history.
As I sit here in utter amazement at this accomplishment, I just wanted to thank you again for your assistance in helping me understand this process.
This “standalone question” is really working super-well for me. I’m working on a project to create a semantic search module for the Drupal CMS, and did this little query demonstration which is coded in large part based on what I learned in this issue thread.
I understand that the standalone question is being used for similarity search but is not clear to me if that is the question used to actually answer or is it the original question.
I had mixt results using each. How should this be done?
I create a “concept” question from the standalone question for similarity search (context retrieval). I then use standalone question + context documents to submit to LLM for response.