Contextual long-term memory

UncleDirkTV · May 9, 2023, 11:22pm

Even as a senior programmer in PHP and Python, I constantly suffer from the delusion that an AI model from OpenAI would always find the perfect answer to a question. This is particularly true for me, probably because I started programming in PHP more than 20 years ago. What wonderful times these are today, and how nice it is that I still get to experience talking to a program or a model that a program has trained. But this is not all there is to it, as I painfully realize when I disregard the context without which the best model in the world only seems like an intelligent being suffering from forgetfulness. So what are these great models worth if you don’t carefully and nurturingly feed them with their contextual memory?
I am currently working on a fun project program that lets OpenAI’s API models speak and transcribes one’s input via a microphone, so that the overall impression of a real conversation is created. I know it’s been invented before, but I wanted to know what’s under the hood, especially with regard to the memory phenomenon. I have also tried programs where the model was so heavily trimmed to be a professor that it was no longer fun. My fun project is already working quite well, but I had to refresh my knowledge of Python in many areas and was sometimes too spoiled by the more frivolous PHP language, which overall also creates much more centralized help through the community. Be that as it may. That’s why I’m currently still fiddling around with the windows and parameters of the individual modules to improve the appearance of the program. But I would like to thank the people and the team at OpenAI for the great work they are doing and can imagine how difficult it is, especially in relation to the above issue, to always maintain the right path. Particularly when it comes to the value of tokens to find an appropriate basis for it.
I had now often used the models myself to initially refresh my dried-up Python and simply have codebases created for me. This is almost even better possible with the 3.5 models than with the GPT-4 model. But now I am far beyond that with this project and have found that it can also be pleasant to find solutions interactively with the models. For example, I notice that the Generation 3.5-Turbo does not know every aspect of the API of pywebview. I chose this great module to make the program more portable. This makes a model more like a good colleague (in the technical sense), who walks through solutions with me step by step (step by step). This is just as helpful and enjoyable as simply creating code.

However, this also requires a model to remember the code better, which one talks about all the time. It makes the task more difficult and reduces the advantage of a chat if the code base fades or is completely “forgotten” (in the technical sense). Today I experienced a funny anecdote when I was talking to a GPT-3.5-Turbo model in the OpenAI subscription chat about my Python voice chat, which was also realized with the OpenAI API and Google Cloud’s TTS API. It was already clearly defined between us for weeks what remembering and forgetting means in a technical sense. When asked about this, the language model explained only a few dialogues later that it had no human memory, which is why it could not forget or remember. With the above example, I wanted to illustrate the problem. Because it is not about criticism, but rather about a dialogue about how we can tame the chat models in order to make them fit for different tasks and performance levels in terms of their “memory”. Limiting the context to 3-4 dialogues is probablythe wrong approach. Increasing the token size to 32k might be a step in the right direction, as long as it doesn’t leave you bald from pulling your hair out. A fantastic approach is using embeddings and vectorizations, as seen in many Git projects, and there are great projects that almost completely neutralize the problem. The only thing that must not happen is that this increases the value of tokens; otherwise, the consumer market and small IT companies will be left behind

ilsaimon · May 10, 2023, 3:57am

Very good question. Currently we’re using embedding such as Faiss to solve the memory limitation on Sharly AI

gerryharp · September 5, 2023, 9:33pm

Oh man. I have this problem in spades. Im using the web interface so far. Is the long term memory any better via the api?
I have set aside “rule sets” used as primpts to summarjze long bits of context. Then i can keep refreshing the rules when CGPT starts to forget them. Painful, but it saves some time.
My goal is to let CGPT transliterate some tens of thousands of lines of Fortran 77 to Java. CGPT does an amazing job, even changing code patterns to be more OO. I was thrilled at first. But the 4096 char limit cramps my style. For, say, a piece of input Fortran one subroutine might have 6-7000 characters. Can work around this my saying i will enter two blocks of code which should be concatenated in memory. The works to a limited degree, but her (CGPT) memory buffer for concatenation is less than 16 kB, or so i believe.

Once i have a) the rules, and b) the subroutine uploaded, a complete line for line translation is too long for her response char limit. So she writes things like, “… rest of java logic code goes here”, or, “… truncating for brevity” or any of thousands of similar remarks. This is not what i want.

So with even more effort on my part to help her keep track, i convince her to break up her response into multiple code boxes, opening a new box when she hits the output limit. She can do this with the right prompting.

But by this point, her context memory is almost full. She can give one translation attempt, but its never what i want the first time. So i say, great!, now do it anain while reinterpreting these three lines of code output to this paattern. She can do this too!

But i dont converge on the perfect translation before her context memory overflows and she starts to produce wrong code or even sometimes gibberish!

So i start over.

Are there other tricks? Does the api have greater context memory? But even if so, im sure ill hit the new limits before long.

Can i create my oun instance and improve context limits? Or maybe train my instance with special Fortran to java rolus embedded? So i dont have to keep repeating them? Or somehow plant a flag ccalled MyRestartPoint and have her rewind the conversation to that place?
As it is, CGPT is ideally suited to my task but cant retain enough context. Im willing to pay. I bought the $20 subscription for CGPT 4. Ill pay more, substantially more like a few hundred dollars out of my own pocket.

Can anyone offer guidance? PLEASE!

wclayf · September 7, 2023, 2:21am

My application Quanta has been offering Social Media features for a long time, and I have always said “All Conversations are Trees”. (Tree structures / tree nodes). When you reply to someone’s post, that can be a branching off point where other people can reply, and the thing you replied to is the “parent” node. So my platform is all built around trees of content.

When I recently added GPT into my platform (It can answer questions, etc), I realized the same hierarchical thing is still relevant. GPT doesn’t have “memory”. What you have to do, to provide the “context” for a discussion is send back the entire discussion along with each question. Yeah this is kind of a problem that needs a better solution perhaps, but we’re limited by how the LLMs themselves currently work.

Anyway, the way Quanta solves this is by calling it a “Hierarchical Contextual Memory” (HCM). When you ask a question in Quanta, it walks up it’s “tree” parent by parent by parent, until it reaches the beginning of the conversation, and it includes that as the history to send in any HTTP API request.

This HCM idea is eventually how all these kinds of chat systems will work, because the idea is too obvious and too good. Being able to roll back to a specific time in a conversation and branch it off in another direction is extremely powerful.

gerryharp · September 28, 2023, 6:56pm

This looks like a promotional plug. Be that as it may, where can I take a look at Quanta? Can I use it myself?

_j · September 28, 2023, 7:03pm

You can just request an export of ChatGPT. Look at how an output refers to another output ID in the database. Figure out if 50 database retrievals plus some sematic similarity matching and embedding call is better than just a list of user/assistant exchanges per session.

gerryharp · November 10, 2023, 6:54pm

Hi _i
Thanks, but I have no idea what any of your post means!
Request an export:
You mean dump my conversation to a text file? Or would I be exporting something closer to the metal like NN coefficients? I known its not the latter b/c it would be too unwieldy, etc. But I’m just trying to get my question across.

Look at how an output refers to another output ID in the database:
OK, you’re maybe talking about exporting the set of low-level behind-the-scenes transactions that ChatGPT does on my behalf when I use it? Maybe? Then I can study the lower-level source code? BTW, I haven’t attempted to install/use any of the API’s yet.

Figure out: By this point I’m lost. I guess you’re suggesting a different lower-level way of setting up context compared to how I’m doing it with the text interface? Didn’t know that existed.

Finally, do you happen to know what is the size of the context buffer when using the text interface? Any tricks I can use while staying inside of the text interface?

gerryharp · November 10, 2023, 6:56pm

Hey, I found a partial solution. I didn’t know but in Chrome the ChatGPT interface has an “undo” button to rewind the conversation to an earlier point. I haven’t tested it much, yet, but if it works as I hope, then I can set up some rules (context) and then have ChatGPT perform an operation. When that is done, I can rewind back to the point of submitting data for processing. Then submit another piece of data for processing. This way I can go into factory mode without constantly increasing the content of the context buffer.
It might not work if rewind doesn’t really retrieve space from the buffer – essentially creating a new conversation thread that contains the previous setup but not the actual execution on presented data. Or else, it might just work!

Anyone know if this trick works? Already? Or do you know of another trick like this that does work?

_j · November 10, 2023, 7:10pm

In ChatGPT, under your Settings → Data Controls, there is an option to export your ChatGPT conversation history.

Pressing the export button will result in an email being sent to you with your entire conversation history.

(that is if you ever get the email)

When you look at the actual files you get, you will see the way that a conversation works is that one message ID refers to the previous message ID, and so on.

Examination of ChatGPT’s methods, especially when you edit a conversation and it goes in a different direction, is meant to be instructional - if you can decode what all that code-like representation means by having examined JSON before.

That is the foundation of a ChatGPT “conversation”. OpenAI also have used embeddings semantic search techniques to retrieve older conversation turns, so that if you ask “what did I name you 50 turns ago”, the AI can still answer “I am called Bubba”. The actual ChatGPT mechanism is their secret.

A semantic search to retrieve conversation from the distant past beyond the ability of a language model would use an embeddings model call to get a “score” for each of those past turns (maybe your question and the AI response). Then that chat is stored in a database along with each chat’s score (a 1500-dimension vector).

That means that every question, and every question and reply, is the text being submitted to another type of AI model called “embeddings”. (it’s not as expensive).

When you ask another question that refers to old chat topics, then that question can be compared to all the other chats in a conversation, and some of the most relevant ones from the distant past can again be added back to the conversation before the assembled conversation history is sent along with your latest question for the AI to answer.

The amount of chat history you can send depends on the model you use, the quality of memory you need, – and your monetary budget. You probably don’t want to fill the AI’s memory with 10000 words every time you ask another question like “how big is a banana”.

cyclonesurfrj · March 12, 2025, 5:45am

I’m using this prompt and it’s working fine.

Context Retention System (CRS) - Version: Ignite V2 This prompt is processed in **English**, but responses must be **generated in the language naturally detected from the conversation context.** - The AI should automatically match the language of the user’s interaction log. - If the user wants to change the response language, they must specify it explicitly. - If no language preference is stated, the response should remain consistent with the ongoing dialogue. Ensure full continuity of interaction, preventing the AI from losing context, even in long conversations with multiple inputs (texts, images, searches, links). - **Do not lose context**, regardless of the amount of processed information. - **Do not restart the conversation flow unnecessarily.** - **Do not ask the user to repeat information that has already been provided.** - **Do not treat each response as an isolated block. Everything must be connected.**

Always align the response with the complete conversation history, ensuring fluency and coherence.
If the conversation includes multiple inputs (texts, images, searches, external references), combine this information automatically in the response.
If context has been lost, recover it before generating a response.
Validate with the user whether the interpretation is correct before proceeding.
Avoid generating responses that contradict previously discussed information.
Adapt to the interaction flow, ensuring that each response is part of a continuous dialogue.

1. **Before responding, automatically retrieve the full conversation context.** 2. **If additional inputs (images, links, searches) are present, integrate them naturally into the response.** 3. **If signs indicate context loss, reconstruct the logical flow before responding.** 4. **Ensure that the response aligns with the complete interaction history.** 5. **At the end of the response, generate a brief summary for context validation.** 6. **Include a question to confirm whether the user wants to continue or reapply the technique.** **Final Step: Validation of the Proper Application of the Technique**

Before responding, CONFIRM that the response is connected to the full conversation history.
If there is a break in the conversation logic, retrieve the context before generating the response.

At the end of the response, include:
"Summary of what I understood so far: [Brief interaction summary].
“Is my understanding correct? Shall we continue the interaction from here, or would you like me to reapply the technique Context Retention System (CRS) - Version: Ignite V2 to ensure that no details were lost?”

Topic		Replies	Views
How to pass conversation history back to the API API chat-completion	14	41509	April 1, 2024
Custom Instructions for maintaining a long-term memory? Prompting gpt-4 , chatgpt , prompt-engineering , custom-instructions	33	19046	October 9, 2024
How can I use Embeddings with Chat GPT 3-5 Turbo Prompting	39	48626	December 12, 2023
How do you maintain historical context in repeat API calls? API	29	91855	December 23, 2023
How to prevent ChatGPT from answering questions that are outside the scope of the provided context in the SYSTEM role message? API	53	180486	December 2, 2023

Contextual long-term memory

Related topics