Short lived memory for chatbot

Currently have:

  • an aws lambda function that receives a question from a user along with some relevant document.
  • lambda makes a request to openai api with a prompt that uses their question and their doucment as the context

User requests:

{ "question"  : "how is john feeling", "context": "the employees john...."}


Answer the question {question} given the relevant context.

context ###

What I’m thinking

  • The relevant context is retrieved from an sql database
  • I’d like to cache the context so to not recreate it each time
  • Ideally some pattern of associating a session id with each user and context that expires within 15 mins
  • when the session is expired, the data would be deleted from the cache
  • if the session id is not expired, lambda gets context from the cache. Otherwise, it is retrieved from SQL and put in the cache for future

I’m wondering if this pattern is recommended. And if there are any solutions similar to it. Or if there are any suggestions on going down this line of thought.


I don’t know if this pattern is ‘recommended’, I imagine it would depend on your use case but I don’t see anything particularly wrong with it.

As far as suggestions, I highly recommend you check out the ChatGPT retrieval plugin (its on the openai github under chatgpt-retrieval-plugin), it should have some functions that are helpful for what you’re trying to do.

The goal in my experience when prompting ChatGPT (outside of the explicit goal) is either to reduce the number of tokens in the message/reply, or to use cheaper models to their full extent (i.e., API calls to ADA are 1/10th the price of calls to Davinci). Here are several methods I know of that achieve these goals.

First, if the document can be broken into sections then you can have a cheap model (like ADA) select a section for you, and then you can pass the relevant sections to a more capable model for answering the question.

Second, if you have a structured question with predictable answers (yes/no, or ranking something) you can use (LMQL is useful for the following) embeddings, flow control, and constraints to have a cheaper model behave in complex and consistent ways.

Finally, if you are going to have a conversational chatbot, you will need a way to send the message history to ChatGPT. One way to use less tokens in a chat like this is to send the chat history to ChatGPT and have it summarize the entire chat every ‘X’ messages. ‘X’ will be different based on API token use over time in a conversation; conversations with more context being added consistently will need a lot of summarization, whereas conversations taking place within ChatGPT’s context window will not need any summarization at all.

I hope I was able to be of some help, and good luck with your project :slight_smile:

1 Like