Add longterm memory per user using OPENAI AGENTS SDK

Adding “memory” can take several forms. Simply the previous chat turn is memory.

  1. Extending chat session knowledge
  • older chat exchanges are not absolutely discarded when out of the budget, but can be re-inserted before recent lossless chat with semantic retrieval against the latest input.
  1. An internal chat state maintained by AI
  • Inject “here’s your current self-managed memory block…”, accompanied by function to update items or rewrite, along with guidance.
  • This might be used by the AI to hold session data like ongoing games or goals, or have data held-out from user experience, like their character sheet or a game maze
  • (AI models don’t do this well)
  1. A cross-session memory
  • This is similar to the last, but aims to learn about the user and preferences, and must avoid local task-based updates
  • also wholly-injected
  • function to update the text items may also be AI powered, prompted to manage and find what to update and delete.
  • (ChatGPT pattern)
  1. Passive conversation history search
  • Uses semantic search on basically everything recorded in multiple sessions in user history
  • Injection must be framed by caution, as AI may regard it with too much relevance (finding a past “I don’t want any code” as instruction now, for example)
  • (ChatGPT pattern)
  1. Observational AI memory updates
  • No tool is presented to the chatbot, but instead a separate AI is tasked with scraping and summarizing criteria-meeting memories not yet known from the latest preference-like indications.
  • Also can be a condensed injection, or have semantic search for topical items to inject.

It seems your biggest hurdle is in making the leap to semantic search, typical of case 1 and 4.

You need a pattern of query construction used against a database on items to be retrieved. Search more powerful than any keyword programmatic search. This is a vector database.

A vector database is a string, plus it’s AI-produced embeddings. An item only needs to be encoded to vector once to form a database of texts. Then fast exhaustive search comparing a query’s vector to those with cosine similarity (dot product if pre-normalized like OpenAI embeddings), ranking the quality of results and constructing an budgeted injection.

You can code this yourself using embeddings and just in-memory vectors, with backing persistent storage that accompanies your user chat history database (depending on how extensive the user search surface is).

Using a provider that delivers a robust product that has initial query fields such as user ID and date range against a larger homogeneous vector store can then be explored when you have a proof of concept and need to scale.

2 Likes