Contextual long-term memory

Even as a senior programmer in PHP and Python, I constantly suffer from the delusion that an AI model from OpenAI would always find the perfect answer to a question. This is particularly true for me, probably because I started programming in PHP more than 20 years ago. What wonderful times these are today, and how nice it is that I still get to experience talking to a program or a model that a program has trained. But this is not all there is to it, as I painfully realize when I disregard the context without which the best model in the world only seems like an intelligent being suffering from forgetfulness. So what are these great models worth if you don’t carefully and nurturingly feed them with their contextual memory?
I am currently working on a fun project program that lets OpenAI’s API models speak and transcribes one’s input via a microphone, so that the overall impression of a real conversation is created. I know it’s been invented before, but I wanted to know what’s under the hood, especially with regard to the memory phenomenon. I have also tried programs where the model was so heavily trimmed to be a professor that it was no longer fun. My fun project is already working quite well, but I had to refresh my knowledge of Python in many areas and was sometimes too spoiled by the more frivolous PHP language, which overall also creates much more centralized help through the community. Be that as it may. That’s why I’m currently still fiddling around with the windows and parameters of the individual modules to improve the appearance of the program. But I would like to thank the people and the team at OpenAI for the great work they are doing and can imagine how difficult it is, especially in relation to the above issue, to always maintain the right path. Particularly when it comes to the value of tokens to find an appropriate basis for it.
I had now often used the models myself to initially refresh my dried-up Python and simply have codebases created for me. This is almost even better possible with the 3.5 models than with the GPT-4 model. But now I am far beyond that with this project and have found that it can also be pleasant to find solutions interactively with the models. For example, I notice that the Generation 3.5-Turbo does not know every aspect of the API of pywebview. I chose this great module to make the program more portable. This makes a model more like a good colleague (in the technical sense), who walks through solutions with me step by step (step by step). This is just as helpful and enjoyable as simply creating code.

However, this also requires a model to remember the code better, which one talks about all the time. It makes the task more difficult and reduces the advantage of a chat if the code base fades or is completely “forgotten” (in the technical sense). Today I experienced a funny anecdote when I was talking to a GPT-3.5-Turbo model in the OpenAI subscription chat about my Python voice chat, which was also realized with the OpenAI API and Google Cloud’s TTS API. It was already clearly defined between us for weeks what remembering and forgetting means in a technical sense. When asked about this, the language model explained only a few dialogues later that it had no human memory, which is why it could not forget or remember. With the above example, I wanted to illustrate the problem. Because it is not about criticism, but rather about a dialogue about how we can tame the chat models in order to make them fit for different tasks and performance levels in terms of their “memory”. Limiting the context to 3-4 dialogues is probablythe wrong approach. Increasing the token size to 32k might be a step in the right direction, as long as it doesn’t leave you bald from pulling your hair out. A fantastic approach is using embeddings and vectorizations, as seen in many Git projects, and there are great projects that almost completely neutralize the problem. The only thing that must not happen is that this increases the value of tokens; otherwise, the consumer market and small IT companies will be left behind

1 Like

Very good question. Currently we’re using embedding such as Faiss to solve the memory limitation on Sharly AI