The Elephant in the Room: Why No Persistent Conversational Memory in LLMs?

njwestburg · February 20, 2025, 1:53am

Here’s what’s fascinating… OpenAI, Anthropic, Google, Meta, Perplexity, Etc…these companies have:

World-class Al engineers
Cutting-edge language models
Massive computing infrastructure
Deep user behavior data

Yet somehow, this fundamental need hasn’t been prioritized. Think about it: These companies already:

Track conversation history
Understand context
Identify patterns
Process complex relationships

The technical barrier isn’t the issue. They could build this. The infrastructure exists. The Al capabilities are there. In fact, their models are already doing most of this analysis in real time…they’re just not preserving it.

It’s particularly interesting because this isn’t just a “nice to have” feature. This is more or so about

Maximizing user value
Increasing platform stickiness
Building competitive advantage
Enabling genuine user growth

For companies racing to build the next viral feature or chasing the latest AI breakthrough, this feels like overlooking the obvious(to say the least…). It’s like having a Ferrari but forgetting to install a steering wheel .

jochenschultz · February 20, 2025, 2:06am

Because you would need a thousand guys unpacking nvme all day…

turbolucius · February 20, 2025, 2:10am

What you’re asking is a much more complicated problem to solve than you think. Current LLMs are, at the core, an autocomplete algorithm — give them text as input, and they’ll predict the next word (or “token”, piece of word) based on patterns they learned from their dataset during training.

Format that text like a conversation, and you’ve got a chatbot.

Train the model on hard problems and reward it when it solves it, and you’ve got models like o1 and o3.

But you’ve still got to give the model text as input. There’s a physical limit — whether it be the amount of RAM, storage, compute time — based on whatever hardware these models are running on.

You could keep “training” the model on user input over time to have it “remember” things from the past without having to include it as input, but you’d have to do that per-user and the costs would be stupidly high.

Context length is improving at a massive rate. Google’s Gemini can already take multiple novels as input. That’s more than everything an average person is willing to say about themselves.

njwestburg · February 20, 2025, 2:47am

Thank you for the thoughtful breakdown. While you raise valid points about LLMs fundamental architecture and limitations, I believe this actually reinforces rather than contradicts my original argument.

The solution I’m advocating for doesn’t necessarily require expanding the core LLM capabilities or implementing costly per user training. The infrastructure and technical components already exist…they’re just not being leveraged optimally.

Consider how these platforms already:

Process and understand complex queries in real-time
Identify key information and patterns within conversations
Track conversation context and relationships
Generate structured outputs from unstructured inputs

The challenge isn’t about making LLMs “remember” everything or extending context windows indefinitely. It’s about intelligently capturing and organizing valuable outputs as they occur naturally within existing limitations.

Think of it like this: When Spotify shows you your yearly wrapped, it’s not reconstructing your entire listening history from scratch…it’s because they’ve been intelligently tracking and categorizing key data points all along. AI platforms could implement similar systems for knowledge preservation without fundamentally altering their LLM architecture.

The core capabilities are there. The computing infrastructure exists. What’s missing is the strategic prioritization to build this layer that connects existing capabilities into a coherent knowledge preservation system.

This isn’t about pushing against the technical limitations you’ve described… it’s about better utilizing what we already have. The first company to recognize and execute on this will have a significant competitive advantage, not because they’ve solved the fundamental LLM limitations, but because they’ve built something valuable within those limitations.

“Just use Markdown for documentation”
“There are open-source tools for this”
“Write a script to parse your exports”
“Set up a knowledge management system”

I’ve tried them all. The problem isn’t that solutions don’t exist…it’s that they all require significant time investment. When I’m deep in a flow state with an Al tool, the last thing I want to do is context switch to manually organize my insights. And let’s be real saying “I’ll document this later” is where good ideas go to die lol.

njwestburg · February 20, 2025, 2:50am

Imagine an export feature that lets you choose exactly what matters to you…

1. Core Knowledge Captures

Successful prompts that actually delivered results
Key insights and breakthrough moments
Pattern recognition in your problem solving approaches
Learning progression timelines
Code snippets that worked (with their context!)
Topic relationship maps
Skill development trajectories

2. Customizable Export Options

Choose your focus as you please…

Code Centric
Learning Focused
Project Based
Pattern Analysis

Etc, Etc

njwestburg · February 20, 2025, 2:58am

While I appreciate the hardware storage concern you’re highlighting this significantly oversimplifies the solution. Modern data architecture doesn’t require raw storage of every interaction it’s about intelligent indexing and selective preservation

Companies like Spotify, Netflix, and even GitHub already implement sophisticated systems that track, categorize, and surface relevant user data without emptying NVMe all day… they use efficient data structures, intelligent compression, and selective storage strategies to manage massive amounts of user data efficiently

If you’re interested in diving deeper into implementation approaches and architectural solutions DM brother!! Always eager to exchange ideas with others actually thinking critically about these problems lol

jochenschultz · February 20, 2025, 3:05am

You don’t put an arctic vault storage in a realtime memory retrival system…

lol

jochenschultz · February 20, 2025, 3:38am

How about an export feature of the ChatGPT data that doesn’t break after ~370 MB so we could build it by ourself?

I would prefer that the known bugs are getting fixed before implementing new features…

njwestburg · February 20, 2025, 3:46am

Agreed.

It’s been 3 days and i have yet to receive my exported data that i have requested 4 different times.

jochenschultz · February 20, 2025, 3:57am

Could it be that ChatGPT doesn’t let us download the ChatHistory on purpose?

I mean “upload ChatGPT export and you can start chatting” with tons of more features like Chatting over all chats…

…

The image shows a digital interface with a chat conversation about "Chain of Thought" on the left and a connected graph illustrating memory nodes on the right, titled "Graph Memory - extracting relevant nodes - how to get unlimited memory." (Captioned by AI)

That thing is getting real big over time without proper mechanism to structure it and scoring to make sure only the most relevant operations are done. When a user complains a lot about something that would become more important for example… but people will find out fast that giving death threats to the chat in every message gives better results but somehow the system got so unusably slow haha…

njwestburg · February 20, 2025, 4:06am

Well yeah. Without proper scoring mechanisms and relevance filtering any knowledge preservation system could be gamed or become unwieldy lol.

njwestburg · February 20, 2025, 4:10am

We’re on the same page that indiscriminate storage isn’t the answer…

The core idea isn’t about archiving every single interaction, but rather about intelligent extraction and structured representation of meaningful information. Think more along the lines of how knowledge graphs are built… identifying entities, relationships, and key insights, etc., rather than just accumulating raw text

njwestburg · February 20, 2025, 4:12am

Identify genuinely valuable interactions, Filter out adversarial behavior, Weight relevance based on actual utility, not user intensity, Maintain performance at scale

The goal isn’t to store everything it’s to preserve what matters while maintaining system usability

jochenschultz · February 20, 2025, 4:12am

Which then means you will need a lot of classification of each message…
And then there is cost per request.

My normal chat activity using o3-mini over the API on openwebui came out to ~50$ per week - so the price of 200$ for pro is really reasonable.

Now imagine you have to add safe guards and sorting mechanisms for each message. Let it only be 10 requests per ChatMessage and another 20 in following reclassification algorithms using LLM.

See where this is going?
Who wants to pay 20k per month for a perfect chat?

*ping me if you want haha

njwestburg · February 20, 2025, 4:15am

Smarter architecture not just more processing. It’s about finding the right balance between cost and utility

jochenschultz · February 20, 2025, 4:18am

You are talking about

Photonic Computing Interface
Quantum-Graph Hybrid Database
Neuromorphic Validation Core
Real-Time Knowledge Orchestrator

Yeah, that would reduce energy usage by 800% and would be 200 times faster than LLM.

But first of all that is in an experimental and highly theoretical stage build on newest research from last two weeks.
And secondly the cost to make it real most probably is at least 50 to 80 million… But yeah it would find a way to communicate with cancer and lure it into a gel so it could be extracted safely haha

_j · February 20, 2025, 4:59am

Here’s looking to turn up all of those, off of half a billion to be spent, and exploration using sequence-dimension algorithm instead of context attention k-v for realistic cluster hardware memory recall.

njwestburg · February 20, 2025, 5:09am

Their sequence dimension approach is fascinating, especially the 1000x reduction in computational costs compared to traditional k v attention for 100M token contexts

njwestburg · February 20, 2025, 5:09am

Have you worked with similar architectures?

_j · February 20, 2025, 5:19am

Since “memories” to current SOTA models are all by context input being placed, the model with widest availability that gives the longest “chat” before needing external technique would be Google’s Gemini 2.0 Pro, experimental: Up to 2 million tokens.

Topic		Replies	Views
Episodic and declarative memory should probably be separate in AGI Community	12	1536	January 12, 2022
Remember everything I've said API	14	6159	February 2, 2024
Who's using custom gpt regularly? Plugins / Actions builders gpts	58	5556	May 20, 2024
How do you maintain historical context in repeat API calls? API	29	93286	December 23, 2023
Moonshot - Predicting the future and making JARVIS! Community	67	7642	November 25, 2023

The Elephant in the Room: Why No Persistent Conversational Memory in LLMs?

Related topics