Context generation for chat based Q&A bot

You are correct, you have to handle “API calls” in your application.
GPT and system message I’m using takes care about your problem. You have to use GPT4 and set temperature to 0. GPT is smart enough to rephrase your second question and replace “it” with “template” when it creates a SEARCH query for the knowledge base. Proof of concept using Playground below.

If you provide enough knowledge when first SEARCH is called, the second question will be answered without calling the SEARCH and you will save some tokens. See below:

I hope this answers your question.


Thank you so much! This is super helpful. You’re a legend! THANKS :smile:

How long time did it take for you to get access to the GPT-4 model API? I am still In the waiting list.

@natasha.donner For 8K model 2 days. 32K still waiting.

1 Like

Solved the issue of context using the “standalone question” Chat Completion Architechture - #2 by AgusPG

I’ve been using this approach and it appears to work 100% of the time (so far!).

1 Like

What is your prompt to create the standalone question? I tried the following prompt and it only works in 50% of the questions.

  prompt= f"""Combine the following question q with the information from memory to create a new specific question.
    Follow these rules: 
    Analyze q and memory, considering that memory contains the previous question. Determine if q is a follow-up question to the memory. If it is, replace anaphoric words in q to specifically target the subject in memory. If it's not a follow up question, simply respond with only q(Don't change q at all) and don't combine q and memory.
    q: \"{q}\", Memory: \"{memory}\".""",

I am using a different language, PHP. Here is the code with the prompt:

	$instruction = "Generate a standalone question which is based on the new question plus the chat history. Just create the standalone question without commentary. New question: ".$question;
	$chatHistory[] = ["role" => "user", "content" => $instruction];

    // Define the data for the API request
    $data = array(
        'messages' => $chatHistory,
        'model' => $model,
		'temperature' => $temperature

So, I append the $instruction you see above to the existing chat history and submit that as my chat completion ‘messages’.

It has worked so far on 100% of the questions I have sent. However, everybody’s application is different. My goal was for the model to keep general track of conversation, and I do this using the “standalone question”. Here is a brief conversation I just executed. My vector store has 3 articles concerning Star Trek. What you see below is my question, the question concept created (used to retrieve the context documents), the standalone question created and the assistant response.

Note how the assistant is able to keep track of what I am talking about as I switch from Star Trek to Harlan Ellison’s lawsuit to the Hugo award to the Netflix deal without specifically mentioning any of them :

Question (type ‘exit’ to quit): what do you know about star trek?
Star Trek
First Question: what do you know about star trek?
Response: Star Trek is a popular science fiction television series. One notable episode is titled “City on the Edge of Forever,” which was written by Harlan Ellison. The episode, which aired in April 1967, centers on time travel and stars Joan Collins. It won the WGA teleplay award and a Hugo award. Harlan Ellison sued CBS Paramount for not paying him for the merchandising, publishing, and other exploitations of the episode. Later, he settled the lawsuit with CBS Paramount, but the terms of the settlement were not disclosed. In 2011, Netflix signed a $200 million deal with CBS for nonexclusive streaming rights to popular TV shows, including Star Trek.

Question (type ‘exit’ to quit): who won the lawsuit?
chat history tokens 229
settlement of lawsuit
Standalone Question: Who did Harlan Ellison settle the lawsuit with?
Response: Harlan Ellison settled the lawsuit with CBS Paramount.

Question (type ‘exit’ to quit): did he win an award?
chat history tokens 260
Awards won by ‘City on the Edge of Forever’
Standalone Question: Did “City on the Edge of Forever” win any awards?
Response: Yes, “City on the Edge of Forever” won the WGA teleplay award and a Hugo award. The episode, written by Harlan Ellison, first aired in April 1967 and centered on time travel, starring Joan Collins.

Question (type ‘exit’ to quit): when was the deal?
chat history tokens 330
Netflix signed a $200 million deal with CBS for nonexclusive streaming rights to popular TV shows, including Star Trek
Standalone Question: When did Netflix sign a $200 million deal with CBS for nonexclusive streaming rights to popular TV shows, including Star Trek?
Response: Netflix signed a $200 million deal with CBS for nonexclusive streaming rights to popular TV shows, including Star Trek, on February 13, 2011.

How, may I ask, is your use case different? That may explain why you’re only getting 50% results.

1 Like

It seems that we have a similar use case, though I am currently using Python for my project and my embeddings index contains over 300 documents, some of which are very similar. My goal is to extract context for standalone questions from my embeddings. While I have followed a methodology similar to the one you provided in the picture, my approach differs by only including the new and previous questions to create a new standalone question, rather than incorporating all previous chat history. This approach may be the root of the issue.

Sending all previous messages and context could potentially exceed the token limit, particularly since some of the articles in my embeddings consist of at least 2000 tokens.

I will experiment with implementing a similar solution to yours to see if it produces better results.

Each new completion prompt only contains a. the system message, b. the standalone question and c. context documents. No need to resend the entire chat history for each completion. I only send the chat history to get the standalone question.

This is how use of the standalone question was explained to me: Chat Completion Architechture - #2 by AgusPG

This is a more detailed chart of my process:

If you have articles you are retrieving that consume 2K in tokens each, you may want to consider chunking them into smaller pieces. In these semantic searches, you only need to find the relevant content, which will usually be found in a paragraph or two. You can always return the link to the full article in your response.

1 Like

Hi guys,

I wanted to share that I have successfully addressed the follow-up question issue with GPT-3 using Python, and the solution is functioning well for me. To generate the standalone question, I have implemented a separate function that utilizes the Completion method, with a temperature of 0.7. I found that using a temperature of 0 did not produce desirable results.

def combine_question(df, model="text-davinci-003", q="What is a spotlist?",chat_history="", max_len=1500, size="ada", debug=False, max_tokens=1000, stop_sequence=None): 
    prompt= f"""Create a SINGLE standalone question. The question should be based on the New question plus the Chat history. 
    If the New question can stand on its own you should return the New question. New question: \"{q}\", Chat history: \"{chat_history}\".""",


        # Create a completions using the question and context
        response = openai.Completion.create(
            prompt= prompt,
        return response["choices"][0]["text"].strip()
    except Exception as e:
        return ""

1 Like

Thank you for providing the more detailed picture. The documentation that I am working with is quite complex, as it comprises step-by-step guidelines for a complicated website. Initially, I attempted to break down the documentation into smaller pieces, but the response was not satisfactory. Based on your suggestion, I will try chunking only the largest articles and providing a link to the full article instead. I think this will be a nice workaround for this issue.

Thank you for all the help! :smiley:

1 Like

right now i have set the prompt instruction of 200 tokens
each time i am hiitng the openai api with user question ,200 tokens is also going with the user question so user question+200prompt
because of this prompt instruction i m geeting the satisfactory repsonce
is there any way that my prompt instruction should go only one time not eveery time when user ask any question,so i can save token and money aslo

In my experience so far, no. You have to send the chat history each time, and I would recommend sending the system prompt at the top of your chat history each time.

This is how the completion process I am using, with the ‘standalone question’, is working for me so far. Very pleased with the results.

Open AI released support for function calling ot GPT 3.5 and 4, so now prompt can be much leaner and task oriented. And this approach will now also work on GPT 3.5. Did not test it, but it should.

1 Like

It does work indeed. (the standalone question is to be converted to in-context fully defined question where the context input is the chat history.)
I wonder if you can do it out of the box keeping the cha thread ID somehow (if it is GPT3) and not hassling with the history mgmt.

On the other hand, if it runs against your custom knowledge base vector you need to do build this logic.

Not sure I understand what you mean by “out of the box” and “not hassling with the history mgmt”. This is what Google says:

LLM stands for Large Language Model. LLMs are stateless, meaning their “state” is entirely determined by the prompt. LLMs are stateless by default. LLMs can remember previous interactions with a user if they have memory.

So, if you want the LLM to understand and keep track of your conversation, you will have to hassle with memory somehow. I mean, you could store the questions and responses in an SQL database, assigning an ID that reflects the conversation, but you still have to be able to retrieve it to let the model know what you are talking about.

Weaviate has generative feeback loops which will do this automatically: Generative Feedback Loops with LLMs for Vector Databases | Weaviate - vector database

But, again, not clear how you wish to address the issue of maintaining conversation context without dealing with memory.


Thank you for the quick reaction and clarification.
In case of a chat bot I think we have to go beyond the stateless nature and inject the conversation.
If we use the method of openai.ChatCompletion.create then we can add the context into the messages with the [“role”: system/user/assistant, content: ] parameter. In doing so, we need to collect the conversation into a list object.
I do not know if Chat GPT (in certain implementations) can return the conversation ID. This may help me maintain a more advanced context instead of using the last X of the messages which is not that professional and also increases the token count. Do you think it is possible to the conversation chain and simply put the model context via referencing it?

Secondly when I am using GPT via langchain (to apply it over a custom knowledge base chain) with these sequence of commands:
*embeddings = OpenAIEmbeddings… > docsearch = FAISS.from_texts(texts, embeddings) > *
chain = load_qa_chain(OpenAI(temperature=0.6), chain_type= “stuff”) > docs = docsearch.similarity_search(query) >, question=query)
I cannot quite see any elegant way of injecting the conversation messages into the method here.
The only way I am finding is to add the history to each question So my question is + " in the context of " + . However, it distorts the accuracy. What could be an accurate way to inject the context here?
Thank you in advance,

There is no “conversation ID” concept in the API currently. Maybe way in the future, but nothing available now. You’ll need to handle storing (and possibly periodically summarizing) conversation history. LangChain has memory tools to handle some of this, but not sure how they work together with the custom knowledge tools. Probably a better to ask on LangChain’s forums.

1 Like

Ditto on @novaphil suggestion: You need to consult with LangChain experts on this.

While I followed a LangChain methodology, I built my system, from scratch, using PHP.

The only thing I can tell you is that I don’t actually inject the entire conversation history into each model call. I generate a standalone question based upon the new question and chat history, and send THAT to the model along with context documents. I posted my process earlier in this conversation:

Good luck. It took me a heck of a long time to figure out my process (6 months), but I eventually did it. I’m sure you will too!

1 Like