Feature Request - Client-Side Storage for Extended ChatGPT Conversations

In extended dialogues, ChatGPT faces a limitation due to its token cap, causing it to “forget” prior parts of a conversation. I’ve personally experienced this in attempting to utilize the platform to perform in-depth creative work. To mitigate this, could the entire conversation be stored on the client side and send it back to the servers for context as needed?

I see some advantages and disadvantages to this:

  • (Plus) Extended Conversation Length: This approach would enable longer, uninterrupted dialogues with ChatGPT, as the model could continually reference prior parts of the conversation.
  • (Plus) Enhanced User Experience: Users engaged in creative or complex conversations, such as constructing a fantasy language (my personal example), would benefit from a consistent, context-aware interaction without needing to recap or re-explain prior inputs.
  • (Plus) Privacy-Preserving: Storing conversation data on the client side, without long-term retention on the server, could appeal to privacy-conscious users and align with best practices for user data handling.
  • (Minus) Transmission Overhead: As the conversation grows in length, continually transmitting the full conversation back to the server might introduce latency, especially for users with slower internet connections.
  • (Minus) Fixed Token Limit: While the model could reintroduce older parts of the conversation, even only storing a referential index of each token would invariably hit the token ceiling eventually.
  • (Minus) Implementation Complexity: This feature would probably need a significant alteration in the client-server interaction. Ensuring data integrity, synchronization, and smooth user experience might require a standalone application rather than just a webgui.

I know I’m sounding like I’m arguing the idea against myself, but I’ve been giving this a fair amount of thought, and I really would love to see future iterations of ChatGPT have a vastly larger conversational memory - through whatever means.

Here’s the thing: OpenAI completely has the ability to do this themselves. You already see the entire conversation of the session from their database. It is just not fully utilized.

There is strong motivation for keeping the past conversation that is sent along with each new question to an AI model to the minimum. Minimize the turns so there is only a passable illusion of memory, simply allowing understanding of follow-up questions.

A larger amount of text loaded into the AI engine’s context, which includes any conversation the management system wants to offer it for understanding, will increase the computational costs of generating the next output a word at a time, and also slows down the token generation rate. Everything previously seen must be considered in computing how inputs should shape the next language generation.

The “token cap” is the context length of the model. It is simply something that cannot be exceeded. Even if someone were to want to pass the maximum conversation instead of the minimum (as can be done when paying for the data using the programmer’s API), this is an unbreakable limit for long sessions that requires software management of the past conversation.

Comprehension can also be reduced with lots of obsolete conversation.

1 Like