Feature Request: Memory-informed inference

Rn, one issue I’ve notice with agents is the discontinuity of inferences means all internal context is lost at the end of the inference except for the chat history. This is due to the nature of the stateless interaction of each inference.

To seem continuous the inference uses the chat history up to that point, but this means that subsequent next inferences are effectively new, separate inferences each time you hit enter/send a request; thus if an agents executes a command the next inference may not know the ‘why’ of the previous inferences actions, or may not be able to infer key details from the provided chat history (texts & images), unless explicitly stated to reason aloud and even then that’s a bit hacky and often loses nuance. So if something doesn’t get said aloud or the detail doesn’t make it into the chat history, the next inference will only know and infer what it can from what it can see from the chat history, hence losing context and not maintaining continuous thoughts & goals across inferences, as that dense internal context (factors (esp unsaid) that contributed to the action; thoughts, intent, goals) does not survive the inference.

it would be nice to have an architecture level way to get important context between inference(s) ie saved prior activations or something like (‘Larimar’) memory informing the inference – perhaps RLHFd to use the memory unit to effectively reward storing, maintaining, and use of important context to complete goals & tasks over various inferences through time.
Which might be easier than training a large SSM. Or maybe state based systems like an SSM akin to a large Mamba, Jamba arch will become more common.

if we had some mechanism for context to survive to next inference(s) either memory-informed inferences, or prior activations, or special cross inference tokens or scratch tokens or an SSM we would be on a good path towards more fluid and coherent agents as they would be able to ‘remember’ the past inference or have goals, ‘thoughts’, dense context through time.

Edit: by memory I mean a non-text based solution, as text is not very compact and if any details are missed then the next inference still does not know ‘why’ or key details. Hence the need for something that retains dense context preferably.

A cos sim db of the embeddings would be somewhat better than just text, but a true memory unit akin to Larimar memory (autoencoder, parameters, decoder) w heavy RLHF to learn to store & use important context and update knowledge accordingly, or past activations w attention applied to help inform current inference of important context, or a SSM state based system would be a preferred, more comprehensive solution. ><’

Though Larimar is largely focused on ‘knowledge updates’, the idea being to use the memory unit not just for ‘knowledge updates’ but RLHFd to be used as an ad-hoc vehicle for getting dense context & important info across inferences
ie rewarding storing, then successful retrieval, then successful use of intent, goals, the why/how prev actions, and other dense & important context.
or something similar w saved previous activations ect. For Long Term Planning agents, continuity across inferences, and overall coherence & fluid reasoning across inferences.

Example:

Ai is playing Pokemon via Pyautogui and Keyboard modules, receiving screenshots, and executing commands

The most recent image is at a naming screen, “AAA”

Text history array:
- Move cursor to position (43,25)
- Press 'A'
- Press 'A'
- 'Press A' while the cursor was over the 'A' letter

The next screen now says “AAAA”

The problem is that each inference can only access the chat history but lacks the internal context, thoughts, or goals from previous inferences. While it’s possible to ask the AI to explicitly state its reasoning and goals, relying solely on text is a low-context method of conveying information. If any details are missed, subsequent inferences may struggle to act coherently, be unable to accurately infer omitted details or missing context, and not be able to effectively make decisions or carry out tasks.

In the example above, the next inference can see the previous inference(s) responded with
" Press ‘A’ " but likely doesn’t know the internal ‘why’, thoughts, reasons, goal(s) it (the previous inference) may have had at that time when it decided that was the best next action, and it can only infer so much as it’s strictly limited to the context it has said in previous responses which can create ambiguous situations or lead to seemingly incoherent responses.

Though we know the architecture is inherently stateless in current tech, it can be easy to forget that each inference is indeed isolated, or assume that because inferences are similar that the next inference would have exactly the same dense context upon the next inference and would somehow just ‘know’ or 100% infer that missing context, but this is usually not the case. :x

If it had some way to keep the dense representations of context/goals/thoughts or activations across inferences, it would be in a better position than merely relying solely on the chat history.

~kinda reminds me of a person with no memory using a journal to baton pass itself information; or a brand new ai emerges from the ether each time you hit enter, is only given the chat history logs (text & images), and is told “FIGURE IT OUT” :rofl: lol