Rn, one issue I’ve notice with agents is the discontinuity of inferences means all internal context is lost at the end of the inference except for the chat history. This is due to the nature of the stateless interaction of each inference.
To seem continuous the inference uses the chat history up to that point, but this means that subsequent next inferences are effectively new, separate inferences each time you hit enter/send a request; thus if an agents executes a command the next inference may not know the ‘why’ of the previous inferences actions, or may not be able to infer key details from the provided chat history (texts & images), unless explicitly stated to reason aloud and even then that’s a bit hacky and often loses nuance. So if something doesn’t get said aloud or the detail doesn’t make it into the chat history, the next inference will only know and infer what it can from what it can see from the chat history, hence losing context and not maintaining continuous thoughts & goals across inferences, as that dense internal context (factors (esp unsaid) that contributed to the action; thoughts, intent, goals) does not survive the inference.
it would be nice to have an architecture level way to get important context between inference(s) ie saved prior activations or something like (‘Larimar’) memory informing the inference – perhaps RLHFd to use the memory unit to effectively reward storing, maintaining, and use of important context to complete goals & tasks over various inferences through time.
Which might be easier than training a large SSM. Or maybe state based systems like an SSM akin to a large Mamba, Jamba arch will become more common.
if we had some mechanism for context to survive to next inference(s) either memory-informed inferences, or prior activations, or special cross inference tokens or scratch tokens or an SSM we would be on a good path towards more fluid and coherent agents as they would be able to ‘remember’ the past inference or have goals, ‘thoughts’, dense context through time.
Edit: by memory I mean a non-text based solution, as text is not very compact and if any details are missed then the next inference still does not know ‘why’ or key details. Hence the need for something that retains dense context preferably.
A cos sim db of the embeddings would be somewhat better than just text, but a true memory unit akin to Larimar memory (autoencoder, parameters, decoder) w heavy RLHF to learn to store & use important context and update knowledge accordingly, or past activations w attention applied to help inform current inference of important context, or a SSM state based system would be a preferred, more comprehensive solution. ><’
Though Larimar is largely focused on ‘knowledge updates’, the idea being to use the memory unit not just for ‘knowledge updates’ but RLHFd to be used as an ad-hoc vehicle for getting dense context & important info across inferences
ie rewarding storing, then successful retrieval, then successful use of intent, goals, the why/how prev actions, and other dense & important context.
or something similar w saved previous activations ect. For Long Term Planning agents, continuity across inferences, and overall coherence & fluid reasoning across inferences.