Adding “memory” can take several forms. Simply the previous chat turn is memory.
- Extending chat session knowledge
- older chat exchanges are not absolutely discarded when out of the budget, but can be re-inserted before recent lossless chat with semantic retrieval against the latest input.
- An internal chat state maintained by AI
- Inject “here’s your current self-managed memory block…”, accompanied by function to update items or rewrite, along with guidance.
- This might be used by the AI to hold session data like ongoing games or goals, or have data held-out from user experience, like their character sheet or a game maze
- (AI models don’t do this well)
- A cross-session memory
- This is similar to the last, but aims to learn about the user and preferences, and must avoid local task-based updates
- also wholly-injected
- function to update the text items may also be AI powered, prompted to manage and find what to update and delete.
- (ChatGPT pattern)
- Passive conversation history search
- Uses semantic search on basically everything recorded in multiple sessions in user history
- Injection must be framed by caution, as AI may regard it with too much relevance (finding a past “I don’t want any code” as instruction now, for example)
- (ChatGPT pattern)
- Observational AI memory updates
- No tool is presented to the chatbot, but instead a separate AI is tasked with scraping and summarizing criteria-meeting memories not yet known from the latest preference-like indications.
- Also can be a condensed injection, or have semantic search for topical items to inject.
It seems your biggest hurdle is in making the leap to semantic search, typical of case 1 and 4.
You need a pattern of query construction used against a database on items to be retrieved. Search more powerful than any keyword programmatic search. This is a vector database.
A vector database is a string, plus it’s AI-produced embeddings. An item only needs to be encoded to vector once to form a database of texts. Then fast exhaustive search comparing a query’s vector to those with cosine similarity (dot product if pre-normalized like OpenAI embeddings), ranking the quality of results and constructing an budgeted injection.
You can code this yourself using embeddings and just in-memory vectors, with backing persistent storage that accompanies your user chat history database (depending on how extensive the user search surface is).
Using a provider that delivers a robust product that has initial query fields such as user ID and date range against a larger homogeneous vector store can then be explored when you have a proof of concept and need to scale.