Yet somehow, this fundamental need hasn’t been prioritized. Think about it: These companies already:
Track conversation history
Understand context
Identify patterns
Process complex relationships
The technical barrier isn’t the issue. They could build this. The infrastructure exists. The Al capabilities are there. In fact, their models are already doing most of this analysis in real time…they’re just not preserving it.
It’s particularly interesting because this isn’t just a “nice to have” feature. This is more or so about
Maximizing user value
Increasing platform stickiness
Building competitive advantage
Enabling genuine user growth
For companies racing to build the next viral feature or chasing the latest AI breakthrough, this feels like overlooking the obvious(to say the least…). It’s like having a Ferrari but forgetting to install a steering wheel .
What you’re asking is a much more complicated problem to solve than you think. Current LLMs are, at the core, an autocomplete algorithm — give them text as input, and they’ll predict the next word (or “token”, piece of word) based on patterns they learned from their dataset during training.
Format that text like a conversation, and you’ve got a chatbot.
Train the model on hard problems and reward it when it solves it, and you’ve got models like o1 and o3.
But you’ve still got to give the model text as input. There’s a physical limit — whether it be the amount of RAM, storage, compute time — based on whatever hardware these models are running on.
You could keep “training” the model on user input over time to have it “remember” things from the past without having to include it as input, but you’d have to do that per-user and the costs would be stupidly high.
Context length is improving at a massive rate. Google’s Gemini can already take multiple novels as input. That’s more than everything an average person is willing to say about themselves.
Thank you for the thoughtful breakdown. While you raise valid points about LLMs fundamental architecture and limitations, I believe this actually reinforces rather than contradicts my original argument.
The solution I’m advocating for doesn’t necessarily require expanding the core LLM capabilities or implementing costly per user training. The infrastructure and technical components already exist…they’re just not being leveraged optimally.
Consider how these platforms already:
Process and understand complex queries in real-time
Identify key information and patterns within conversations
Track conversation context and relationships
Generate structured outputs from unstructured inputs
The challenge isn’t about making LLMs “remember” everything or extending context windows indefinitely. It’s about intelligently capturing and organizing valuable outputs as they occur naturally within existing limitations.
Think of it like this: When Spotify shows you your yearly wrapped, it’s not reconstructing your entire listening history from scratch…it’s because they’ve been intelligently tracking and categorizing key data points all along. AI platforms could implement similar systems for knowledge preservation without fundamentally altering their LLM architecture.
The core capabilities are there. The computing infrastructure exists. What’s missing is the strategic prioritization to build this layer that connects existing capabilities into a coherent knowledge preservation system.
This isn’t about pushing against the technical limitations you’ve described… it’s about better utilizing what we already have. The first company to recognize and execute on this will have a significant competitive advantage, not because they’ve solved the fundamental LLM limitations, but because they’ve built something valuable within those limitations.
“Just use Markdown for documentation”
“There are open-source tools for this”
“Write a script to parse your exports”
“Set up a knowledge management system”
I’ve tried them all. The problem isn’t that solutions don’t exist…it’s that they all require significant time investment. When I’m deep in a flow state with an Al tool, the last thing I want to do is context switch to manually organize my insights. And let’s be real saying “I’ll document this later” is where good ideas go to die lol.
While I appreciate the hardware storage concern you’re highlighting this significantly oversimplifies the solution. Modern data architecture doesn’t require raw storage of every interaction it’s about intelligent indexing and selective preservation
Companies like Spotify, Netflix, and even GitHub already implement sophisticated systems that track, categorize, and surface relevant user data without emptying NVMe all day… they use efficient data structures, intelligent compression, and selective storage strategies to manage massive amounts of user data efficiently
If you’re interested in diving deeper into implementation approaches and architectural solutions DM brother!! Always eager to exchange ideas with others actually thinking critically about these problems lol
I mean “upload ChatGPT export and you can start chatting” with tons of more features like Chatting over all chats…
…
That thing is getting real big over time without proper mechanism to structure it and scoring to make sure only the most relevant operations are done. When a user complains a lot about something that would become more important for example… but people will find out fast that giving death threats to the chat in every message gives better results but somehow the system got so unusably slow haha…
We’re on the same page that indiscriminate storage isn’t the answer…
The core idea isn’t about archiving every single interaction, but rather about intelligent extraction and structured representation of meaningful information. Think more along the lines of how knowledge graphs are built… identifying entities, relationships, and key insights, etc., rather than just accumulating raw text
Identify genuinely valuable interactions, Filter out adversarial behavior, Weight relevance based on actual utility, not user intensity, Maintain performance at scale
The goal isn’t to store everything it’s to preserve what matters while maintaining system usability
Which then means you will need a lot of classification of each message…
And then there is cost per request.
My normal chat activity using o3-mini over the API on openwebui came out to ~50$ per week - so the price of 200$ for pro is really reasonable.
Now imagine you have to add safe guards and sorting mechanisms for each message. Let it only be 10 requests per ChatMessage and another 20 in following reclassification algorithms using LLM.
See where this is going?
Who wants to pay 20k per month for a perfect chat?
Yeah, that would reduce energy usage by 800% and would be 200 times faster than LLM.
But first of all that is in an experimental and highly theoretical stage build on newest research from last two weeks.
And secondly the cost to make it real most probably is at least 50 to 80 million… But yeah it would find a way to communicate with cancer and lure it into a gel so it could be extracted safely haha
Here’s looking to turn up all of those, off of half a billion to be spent, and exploration using sequence-dimension algorithm instead of context attention k-v for realistic cluster hardware memory recall.
Their sequence dimension approach is fascinating, especially the 1000x reduction in computational costs compared to traditional k v attention for 100M token contexts
Since “memories” to current SOTA models are all by context input being placed, the model with widest availability that gives the longest “chat” before needing external technique would be Google’s Gemini 2.0 Pro, experimental: Up to 2 million tokens.