Chat Completion Architechture

I watched this video GPT-4 & LangChain Tutorial: How to Chat With A 56-Page PDF Document (w/Pinecone) - YouTube

Am curious about this chat completion architecture:

I still don’t understand why, in their architecture, they:

“chat history + new question → submit to gpt to generate new standalone question → vector search with new standalone question”

instead of “chat history + new question → vector search”.

This creates three OpenAI api calls per query (creation of standalone question, vectorization of question and standalone question + context documents).

And, I’m also not clear on just what goes into the chat history? The original questions, context documents submitted, standalone questions created and responses OR just standalone questions and responses?

Could someone help me out here?

1 Like

I explained this idea of the stand-alone question here. Glad to see that it’s being used in other places! :slight_smile:

You need to “de-contextualize” the question in a conversational context to do a proper semantic search. Imagine the conversation:

  • User: What is the capital of Spain?
  • Assistant: It is Madrid.
  • User: How many people do they live in there?

If you embed the question “How many people do they live in there?” and conduct the semantic search, you won’t retrieve documents that talk specifically about the population of Madrid. This is because it is a “contextual” question: it only makes sense in the context of the on-going conversation. You can solve this by using a module that de-contextualizes the contextual question into a “stand-alone” one. Something like “What is the population of Madrid?”

You can easily achieve this with an additional call to OpenAI. In my case, the “chat history” is only composed by the previous QAs: no documents. You don’t need the supporting documents to reformulate the contextual question into a stand-alone one: previous utterances are enough. In fact, I only send three QAs pairs and that’s usually more than enough to produce the stand-alone question.

Hope it helps :slight_smile:

6 Likes

Yes! Thank you! This makes perfect sense.

So, my chat history would be:

1st question
1st response

2nd standalone question
2nd response

etc…

Many thanks for helping me to understand this, finally!

1 Like

Of course. Very happy to help!
:slight_smile:

Great explanation.

Sometimes a conversation of 8 message pairs is required to fully contextualize and answer the question.
Sometimes a conversation of 12 pairs only needs the last message.

1 Like

Good insights. Obviously, in the answering stage you try to send a huge portion of the on-going conversation (in combination with supporting docs and the contextual question itself). As much info as you can.

But in order of de-contextualizing only, in my experience usually a small number of utterances is more than enough. This is because each one of these interactions already have a lot of context about the current conversational topic. I hardly ever (or never) run into situations where, in order to fully characterize the stand-alone question, I need to go back to more than 5 previous utterances.

Also: you need to explicitly instruct the module to produce the same question if the “contextual” question is already a “stand-alone” one, that can be understood without the previous context.

Anyways, as this “de-contextualizer” only has previous QA as input, you have a lot of tokens to play with :slight_smile:. You can perfectly pass a lot of previous utterances if your conversational context needs so.

1 Like

I have not done this yet. But, I have completed the coding of my chat completion program, and it is working like a charm. I am chatting with my documents, and loving it! It’s really remarkable how little information, as you state, is necessary to maintain the context in chat history.

As I sit here in utter amazement at this accomplishment, I just wanted to thank you again for your assistance in helping me understand this process.

:+1:

1 Like

Thank you for such positive feedback! @SomebodySysop. I’m really happy to hear that this is being useful to somebody else! :smiley:

This “standalone question” is really working super-well for me. I’m working on a project to create a semantic search module for the Drupal CMS, and did this little query demonstration which is coded in large part based on what I learned in this issue thread.

I understand that the standalone question is being used for similarity search but is not clear to me if that is the question used to actually answer or is it the original question.

I had mixt results using each. How should this be done?

I create a “concept” question from the standalone question for similarity search (context retrieval). I then use standalone question + context documents to submit to LLM for response.

The way it should be done is the way that works best for you.

Now that we are getting much larger input context windows, I may start submitting the full chat history along with original question.