I built my own memory crunchers and dynamic token system to adjust based on inputs that way a simple hello is faster than something that needs more insights. the memory system already is complete for the most part. it can already pull hours, days, weeks … up to whatever you want but at a time cost for processing additional. but that time is minimal because of parallel processing.
the memory system size is dynamic and it controls the max size but has access to everything long term with the idea that if I wanted to know what we were doing at a specific time last year it can pull that data into memory. So only limitations is my ai crunchers and how well they understand the data which I have gotten really good at over 5 memory system overhauls in almost 2 years.
The reason I do it this way is if I want the ai to remember a specific time event it can look at that hour minute and second. part of its understanding involves time tracing so it can understand logic order of events. or say I have an image from vision system stored I can have it look back at all the details of the original to pull new insights. this allows me than to build new dynamic data points on the fly on the data much like how you would use a cloud point model for understanding.
DB space will grown forever I am ok with that as it was part of the design and space is cheap.
If space becomes issue than I simple crunch the DB removing the information which is already flagged as not important so down the road I can build a memory deletion system to remove junk information, bad inputs and the likes.
just wish I had a 32k model to play with that is as cheap as the 16k than I would be laughing as that would be twice as much. the 128k GPT 4 with my dynamic system creeps out people lol. my twitch testers both love it and are scared of it. but I tell them its only as smart as it looks lol.