Retain past responses in memory without sending them again at every API request


I’m trying to use to use the API for gpt-3.5-turbo and gpt-4 in order to elicit a number of responses based on a (very long) initial set of instructions and then individual sentences. The instructions explain to the model what to do with the input sentences. The interface would look as follows

  • Very long initial instruction detailing what to do with each input + Input 1
  • → GPT output
  • Input 2
  • → GPT output
  • Input 3
  • → GPT output
    and so on.

In the user interface it’s very easy to do this, however in the python API, every time I make a call to openai.ChatCompletion.create it seems to require the full chat history in order to remember things. This obviously eats up a lot of tokens (and most importantly money!). Number of tokens is not an issue as I’m still below the 8k limit for GPT4, however having to pay to send the full set of instructions as input tokens for every API request (while this is not required when accessing through the UI) is annoying me.

So my question is: is there a way to use the python API interactively, a bit like the UI, so that a GPT model remembers previous answers and instructions without having to send the full set of instructions with every single input request, thus wasting unnecessary money? Happy to provide a short reproducible example if the question is not clear enough.

1 Like

Hello. This is how it works - ie you need to append to keep context. ChatGPT is likely doing this on the backend too along with maybe summarizing extremely old messages in the chat-chain.

So, the best way to battle costs is to be lean and mean with your prompts… keep editing until you get a bare minimum that you need to get your task done.

1 Like

So no way to do it without having to send the previous chat?

no way.
Think about the logic.

If my app doesnt require the Chat History. Then how?
so the ChatGPT-Turbo still trying to remember it? As If this is a default-feature ( Auto remember chat history )
If this happen, do you know the current ChatGPT API workload can be 10x higher.

— NO, impossible. –
— Everytime you call the API, there is no chat-session ID as far as i know.
— Even there is SessionID, Where to store the Chat History? 30 days? 60 days?
— So will it cost extra money to store the chat history?
— i still need to opt out the chathistory feature if i dont need it for certain use case…

So i guess, this is what OPENAI doing now:
No Auto Chat History and the 8k token limit per call
= use it at your extra cost. ( not everybody else ) ( call and record your own chat session history )
= limit at 8k, not 80k so you still getting a good response time.

---- i would opt for higher token limit. BUT Not the by-default-chat history ON
---- because not every scenario we needed the chat history.

If you are using langchain consider using map-reduce as a chain type. Since it summarize everything before sending them to openAI for chat completion. If you use stuff, it means it takes everything and as time goes the chat history keeps appending and become big. Which after a time it wont remember.

OpenAI requires prompt and chat history to predict the next token. Since it is not made to remember anything, rather than complete a prompt based on what user send to it. Thats why we save history out of openAI, but we summarize it with instructions and then it to openAI, and it will act like it is remembering but in reality its what you are sending currently