The Elephant in the Room: Why No Persistent Conversational Memory in LLMs?

Here’s what’s fascinating… OpenAI, Anthropic, Google, Meta, Perplexity, Etc…these companies have:

  • World-class Al engineers
  • Cutting-edge language models
  • Massive computing infrastructure
  • Deep user behavior data

Yet somehow, this fundamental need hasn’t been prioritized. Think about it: These companies already:

  • Track conversation history
  • Understand context
  • Identify patterns
  • Process complex relationships

The technical barrier isn’t the issue. They could build this. The infrastructure exists. The Al capabilities are there. In fact, their models are already doing most of this analysis in real time…they’re just not preserving it.

It’s particularly interesting because this isn’t just a “nice to have” feature. This is more or so about

  • Maximizing user value
  • Increasing platform stickiness
  • Building competitive advantage
  • Enabling genuine user growth

For companies racing to build the next viral feature or chasing the latest AI breakthrough, this feels like overlooking the obvious(to say the least…). It’s like having a Ferrari but forgetting to install a steering wheel :joy:.

13 Likes

Because you would need a thousand guys unpacking nvme all day…

2 Likes

What you’re asking is a much more complicated problem to solve than you think. Current LLMs are, at the core, an autocomplete algorithm — give them text as input, and they’ll predict the next word (or “token”, piece of word) based on patterns they learned from their dataset during training.

Format that text like a conversation, and you’ve got a chatbot.

Train the model on hard problems and reward it when it solves it, and you’ve got models like o1 and o3.

But you’ve still got to give the model text as input. There’s a physical limit — whether it be the amount of RAM, storage, compute time — based on whatever hardware these models are running on.

You could keep “training” the model on user input over time to have it “remember” things from the past without having to include it as input, but you’d have to do that per-user and the costs would be stupidly high.

Context length is improving at a massive rate. Google’s Gemini can already take multiple novels as input. That’s more than everything an average person is willing to say about themselves.

4 Likes

Thank you for the thoughtful breakdown. While you raise valid points about LLMs fundamental architecture and limitations, I believe this actually reinforces rather than contradicts my original argument.

The solution I’m advocating for doesn’t necessarily require expanding the core LLM capabilities or implementing costly per user training. The infrastructure and technical components already exist…they’re just not being leveraged optimally.

Consider how these platforms already:

  • Process and understand complex queries in real-time
  • Identify key information and patterns within conversations
  • Track conversation context and relationships
  • Generate structured outputs from unstructured inputs

The challenge isn’t about making LLMs “remember” everything or extending context windows indefinitely. It’s about intelligently capturing and organizing valuable outputs as they occur naturally within existing limitations.

Think of it like this: When Spotify shows you your yearly wrapped, it’s not reconstructing your entire listening history from scratch…it’s because they’ve been intelligently tracking and categorizing key data points all along. AI platforms could implement similar systems for knowledge preservation without fundamentally altering their LLM architecture.

The core capabilities are there. The computing infrastructure exists. What’s missing is the strategic prioritization to build this layer that connects existing capabilities into a coherent knowledge preservation system.

This isn’t about pushing against the technical limitations you’ve described… it’s about better utilizing what we already have. The first company to recognize and execute on this will have a significant competitive advantage, not because they’ve solved the fundamental LLM limitations, but because they’ve built something valuable within those limitations.

  • “Just use Markdown for documentation”
  • “There are open-source tools for this”
  • “Write a script to parse your exports”
  • “Set up a knowledge management system”

I’ve tried them all. The problem isn’t that solutions don’t exist…it’s that they all require significant time investment. When I’m deep in a flow state with an Al tool, the last thing I want to do is context switch to manually organize my insights. And let’s be real saying “I’ll document this later” is where good ideas go to die lol.

1 Like

Imagine an export feature that lets you choose exactly what matters to you…

1. Core Knowledge Captures

  • Successful prompts that actually delivered results
  • Key insights and breakthrough moments
  • Pattern recognition in your problem solving approaches
  • Learning progression timelines
  • Code snippets that worked (with their context!)
  • Topic relationship maps
  • Skill development trajectories

2. Customizable Export Options

Choose your focus as you please…

  • Code Centric
  • Learning Focused
  • Project Based
  • Pattern Analysis

Etc, Etc

While I appreciate the hardware storage concern you’re highlighting this significantly oversimplifies the solution. Modern data architecture doesn’t require raw storage of every interaction it’s about intelligent indexing and selective preservation

Companies like Spotify, Netflix, and even GitHub already implement sophisticated systems that track, categorize, and surface relevant user data without emptying NVMe all day… they use efficient data structures, intelligent compression, and selective storage strategies to manage massive amounts of user data efficiently

If you’re interested in diving deeper into implementation approaches and architectural solutions DM brother!! Always eager to exchange ideas with others actually thinking critically about these problems lol

You don’t put an arctic vault storage in a realtime memory retrival system…

lol

How about an export feature of the ChatGPT data that doesn’t break after ~370 MB so we could build it by ourself?

I would prefer that the known bugs are getting fixed before implementing new features…

1 Like

Agreed.

It’s been 3 days and i have yet to receive my exported data that i have requested 4 different times.

1 Like

Could it be that ChatGPT doesn’t let us download the ChatHistory on purpose?

I mean “upload ChatGPT export and you can start chatting” with tons of more features like Chatting over all chats…

The image shows a digital interface with a chat conversation about "Chain of Thought" on the left and a connected graph illustrating memory nodes on the right, titled "Graph Memory - extracting relevant nodes - how to get unlimited memory." (Captioned by AI)

That thing is getting real big over time without proper mechanism to structure it and scoring to make sure only the most relevant operations are done. When a user complains a lot about something that would become more important for example… but people will find out fast that giving death threats to the chat in every message gives better results but somehow the system got so unusably slow haha…

Well yeah. Without proper scoring mechanisms and relevance filtering any knowledge preservation system could be gamed or become unwieldy lol.

1 Like

We’re on the same page that indiscriminate storage isn’t the answer…

The core idea isn’t about archiving every single interaction, but rather about intelligent extraction and structured representation of meaningful information. Think more along the lines of how knowledge graphs are built… identifying entities, relationships, and key insights, etc., rather than just accumulating raw text

Identify genuinely valuable interactions, Filter out adversarial behavior, Weight relevance based on actual utility, not user intensity, Maintain performance at scale

The goal isn’t to store everything it’s to preserve what matters while maintaining system usability

Which then means you will need a lot of classification of each message…
And then there is cost per request.

My normal chat activity using o3-mini over the API on openwebui came out to ~50$ per week - so the price of 200$ for pro is really reasonable.

Now imagine you have to add safe guards and sorting mechanisms for each message. Let it only be 10 requests per ChatMessage and another 20 in following reclassification algorithms using LLM.

See where this is going?
Who wants to pay 20k per month for a perfect chat? :wink:

*ping me if you want haha

1 Like

Smarter architecture not just more processing. It’s about finding the right balance between cost and utility

You are talking about

  1. Photonic Computing Interface
  2. Quantum-Graph Hybrid Database
  3. Neuromorphic Validation Core
  4. Real-Time Knowledge Orchestrator

Yeah, that would reduce energy usage by 800% and would be 200 times faster than LLM.

But first of all that is in an experimental and highly theoretical stage build on newest research from last two weeks.
And secondly the cost to make it real most probably is at least 50 to 80 million… But yeah it would find a way to communicate with cancer and lure it into a gel so it could be extracted safely haha

Here’s looking to turn up all of those, off of half a billion to be spent, and exploration using sequence-dimension algorithm instead of context attention k-v for realistic cluster hardware memory recall.

Their sequence dimension approach is fascinating, especially the 1000x reduction in computational costs compared to traditional k v attention for 100M token contexts

Have you worked with similar architectures?

Since “memories” to current SOTA models are all by context input being placed, the model with widest availability that gives the longest “chat” before needing external technique would be Google’s Gemini 2.0 Pro, experimental: Up to 2 million tokens.

1 Like