The Elephant in the Room: Why No Persistent Conversational Memory in LLMs?

This one is already here. It’s called RAG, as you already know! It’s becoming more long-term in ChatGPT, and it uses more tokens as context, but is not a new technology. OpenAI seems willing to add more history in ChatGPT (via brute force or RAG) to add value. So I think this will be given. But it does involve more infrastructure, and obviously more input tokens, which cost more to run. So depending on RAG / history infrastructure costs and additional computing costs … the lower the better … will determine how fast this gets adopted.

I think there was another mention or hint of a non-RAG way of doing it, that just involves compute, without a bunch of DB stuff, and that would be another front-end “preference model” tuned to each user. These small models could be trained regularly to adapt to the user preferences, past histories, and it can even form “memories” of past interactions that would influence the discussion.

What’s cool about this, is that you could export the weights of this model to another vendor, another model or system, and resume your preferences and memories across other models. This would be assuming such models get standardized, and become easily portable. All without big DB transfers, which would require some sort of ETL unique to each DB, and embedding costs and overhead, nobody uses the same embedding model.

If anything, you should start training your own preference model, and using it to compactly store information about you over time. Then use this in conjunction with RAG to “oversee” the entire generation that the LLM is creating.

:thinking:

2 Likes