At the moment, if I have a large system prompt, it runs every single time.
What do you guys think about a way that we could save the inner GPT state after the system message and continue from there?
This would save a LOT of money AND a LOT of response time.
I understand you can’t save every request like in ChatGPT, but maybe I can save my system prompt with a specified other function and then start from there right off the bat. It would enable soooo much more capability, like having a large text file and then asking it in the user message which information is relevant for a certain request etc.
It works just fine in ChatGPT.
What do you guys think?
Every token sent to the AI has to be processed, it does not matter where it gets stored, it has to be included each time. So, while your idea would work for just you, if everyone had a custom system prompt that OpenAI paid to get processed every time… you see the problem.
The problem seems to be that people do not understand how these systems work or that neural networks are memoryless.
That’s why every so many days we see someone asking “what if it just remembered this so I don’t need to pay so much?”
If the system prompt is so large it becomes cost prohibitive to use, one possible solution to look into is running a fine-tuned model instead.
No, you didn’t quite understand the idea. Process it once and save the internal state of the GPT at that moment. Not the tokens.
I think you’re making some incorrect assumptions about how ChatGPT works. It’s not saving the model state, just saving the text history of the conversation. Same as you have to do with the API.
I am very certain that it is not memoryless. We just don’t get access to the memory in the API. ChatGPT has memory, for sure. I have a few conversations that are more than 20k tokens long in total (writing a book chapter by chapter for example) and the response starts within 3 seconds. They are not running the whole conversation through the model every time. Do you get what I mean?
ChatGPT is very good at inferring context across time by using the surrounding language, but that is not the same as having a long term memory, LLMs have a fixed volatile context, that context is tokenised text which then gets turned into a vectorised embedding as an initial representation of the prompt and that is then used as input for the transformer layers in the model, each time that is done the embedding is different, even with small changes to the prompt. Our current understanding of neural nets and the way they encode data is insufficient to make what you request in any practical way.
Damn, I guess I am wrong then. I will have to read up on it then.
Yup, the prompt input that was loaded into the context of the AI model engine is wiped the second your completion is done. There’s not one model waiting just for you to use it, there’s millions of users and a packed datacenter of inference servers.
ChatGPT has a backend database for containing conversations (that you obviously see without alteration) and context code for passing back the few turns that might be relevant the next time you ask something, just as you’d have to program yourself for similar (or easily better) performance.
It’s absolutely memoryless.
Even OpenAI is sending the entire context with each message in ChatGPT.
It’s just a bunch of multiply and accumulate steps and it’s not feasible to build memory into that.
To summarize all, it’s not a RNN so not possible.