How do you maintain historical context in repeat API calls?

I’ve got this working in a Google Sheet:

Feeding back an NLP analysis of the last Response into the follow up Prompt helps sustain conversation flow. GPT is asked to extract keywords, named entities, context and sentiment from the Response and add them at the head to the follow up interaction. In this way conversation flow appears to be sustained.

Topic, DoNLP, LastResponse and FollowUp are range names.

analyse the Prompt using NLP and return topic, context, named entities, keywords and sentiment and then respond to the Follow Up question :

Who were the main characters

In A10 is the formula ="On the topic of: “&Topic&” "&DoNLP&CHAR(10)&CHAR(10)&LastResponse&CHAR(10)&"Follow up: "&FollowUp

Where the last response was about Bulgakov’s novel the White Guard the next Prompt becomes:

analyse the Prompt using NLP and return topic, context, named entities, keywords and sentiment and then respond to the Follow Up question : The White Guard was written by the Ukrainian writer Mikhail Bulgakov. It is a novel that depicts the events of the Ukrainian Revolution of 1918 and the subsequent civil war in Ukraine. Bulgakov is also known for his famous novel, The Master and Margarita.

(Source gpt-3.5-turbo Temperature 0.7)
and respond to the follow up question : Who were the main characters

1 Like

From the docs over here: OpenAI API

# Note: you need to be using OpenAI Python v0.27.0 for the code below to work
import openai

        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}

If you use python, you could wrap this in a function like

import openai

def chat_with_gpt3(history, newprompt):
    messages = [{"role": "system", "content": "You are a helpful assistant."}]
    messages += history
    messages.append({"role": "user", "content": newprompt})
    response = openai.ChatCompletion.create(
    return response.choices[0].text.strip()

and call it like

history = [    {"role": "user", "content": "Who won the world series in 2020?"},    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},]

newprompt = {"role": "user", "content": "Where was it played?"}

result = chat_with_gpt3(history, newprompt)

I’m not sure if this perfectly answers the question above, but hopefully folks coming across this thread can find it useful. You can add previous prompts, and you could create a ‘summarize’ function also using gpt to shorten the length of previous conversations, and update the history to be simply:

history = [ {“role”: “user”, “content”: “Summarize our conversation so far”}, {“role”: “assistant”, “content”: “{summary}.”},]


Hey Xiaokunhou, I came across this and managed to work it out


import openai
import os
from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationSummaryBufferMemory
from langchain.chains import ConversationChain

llm = OpenAI(temperature=0)
conversation_with_summary = ConversationChain(
    # We set a very low max_token_limit for the purposes of testing.
    memory=ConversationSummaryBufferMemory(llm=OpenAI(), max_token_limit=40),

conversation_with_summary.predict(input='FIRST PROMPT')
conversation_with_summary.predict(input='SECOND PROMPT')
conversation_with_summary.predict(input='N PROMPT')

This gives a very ChatGPT-like experience. However, I still did manage to hit the token limit pretty quickly after feeding in 2 documents.


I ran into the same problem, my use case doesn’t depend on the context heavily, it’s not conversation based. I have a tool that is supposed to extract some information from text, what I did is I set the initial system message that sets the conversation’s theme, the message was slightly long because I included an example in it for GPT to learn from it but it doesn’t have to be long in everyone’s case and it does a decent job at giving context.

I inspected the payload on OpenAI’s own frontend, they don’t seem to be sending the entire chat history, they’re sending some ID(probably conversation ID) and my most recent prompt as the payload, maybe they’re storing the conversation on their backend and prepending it in more efficient ways, and obviously they’ll use it as training data later.

I’m curious as to what those more efficient ways could be.

1 Like

Perhaps this approach will be useful:
async getResponse(inputText: string) {
try {
this.previousResponses.push({ role: ‘user’, content: inputText });

        const messages: IChatMessage[] = [...this.previousResponses];

        const response = await{
            model: 'gpt-3.5-turbo-16k-0613',
            temperature: 0.5,


        return response;
    } catch (error) {
        Logger.error(`Error querying OpenAI API: ${error}`);
        throw new Error(`Error querying OpenAI API: ${error}`);

The page does not load, do you have another link, I would like to read it

Here is a link to the article you wanted

The domain name changed but the content is still there

1 Like

Don’t know if this is relevant, but…

I use a ‘shadow’ conversation built from a vecror db that sends only what is needed to reply to the current message. In the chat window the conversation is full and complete, but the requests are a mish-mash of rag, a little text file that maintains context (‘we’re talking about [thing]’) and summarised portions of the conversation.

LLMs are completely stateless, this can be used to your advantage as you can craft exactly what it sees.

Since the OP has not visited site since the day after posting, closing this topic.

If other still have questions they would like answered, please DM an API moderator and ask them to split your post(s) into a new topic.