Episodic and declarative memory should probably be separate in AGI

daveshapautomator · November 3, 2021, 12:03pm

As I’m working on using fine-tuning to bring my RAVEN model into full functionality, I’m approaching the point of revisiting memory. In humans, these are separate systems. If evolution chose to keep episodic and declarative memory separate, there must be a reason, some advantage or limitation.

While they are both memory, perhaps their actual operation is very different. Certainly, the way we accumulate them is different, as is how we use them. Episodic memory is used to build a mental model of self and others, to construct a narrative history of our existence. Declarative knowledge, on the other hand, is auxiliary - merely used as a tool to support our journey through life.

It’s far more important for me to remember that I went into the office for the first time yesterday since the pandemic began. It’s important for me to remember the conversations I had. It’s much less important for me to recall the date of the explosion of Krakatoa, 1883, and yet I can do both. Since we can employ both episodic and declarative memory at the same time and often with equal ease, I figured they could be implemented as the same system in AGI.

However, I’m recalling some experiments I did. One such was using GPT-3 to just spit out random facts. It’s great at that, and would win at any game of trivia. But it can also confabulate. It’s this tendency to confabulate that worries me. Perhaps the episodic system should use low temperature so it simply reads and regurgitates memories from the database. In another experiment I attempted to use gpt-3 to discern episodic from declarative memory, and it utterly failed. I created a little scenario and asked it “Is Dave on fire?” And it says “yes because I can see the flames”. Perhaps with fine-tuning, it could handle such tasks better. But now I’m wondering if GPT-3 should be involved in memory at all? Why not just search the database and transcribe memories verbatim? This would be faster and cheaper, after all. I’ve also hypothesized that AGI episodic memories should be stored in a blockchain so that they cannot be tampered with. (You may recall that tampering with AGI memory was a major plot point in Westworld). It would make sense to store episodic memories in a blockchain, but perhaps not declarative knowledge.

However, the biggest problem with declarative knowledge is the question “what’s actually true and how do you know it’s true?” Fortunately, I don’t think this is actually a problem, so long as you handle it correctly. For instance, if you record metadata in your knowledge database you can keep track of who said what and when. For example, you might record a fact as having come from Wikipedia. You can then also look up the reliability of Wikipedia, as well as cross reference multiple sources. Finally, by just giving GPT-3 all this information, you can ask it how reliable the information is. It can handle ambiguity and questions of epistemology. Perhaps this difference underscores the main functional difference between episodic memory and declarative knowledge: episodic memory is taken as true and unquestioning, even if you can reinterpret it later. Declarative knowledge is never taken as true and must always be cross referenced and interpreted with a level of doubt.

This is a tough nut to crack and there are even harder nuts out there, such as cognitive control. How do you know when to stop talking and listen during an interruption? How do we make split second judgments to avoid danger and harm? These interruptions are something that I have yet to figure out with NLCA. But it’s apparent that our brains are always prioritizing our attention. Once I get to the point of integrating cognitive control, I might have to read On Task again. For an example of cognitive control, imagine you’re unloading your dishwasher and a knife slips from your hand. It’s heading straight for your toes. You instinctively stop everything and yank your foot out of the way. You don’t just continue and ignore the falling knife. In another example, you start to speak at the same time as someone else, but you make a split second decision to stop and listen. (it should be noted that this last task is particularly difficult for people with ADHD, which is a deficit of cognitive control, among other things). Anyways, this all clearly underscores a need for an interrupt system within an AGI but that’s a problem for future me. First, gotta figure out memory.

This is set to “looking for teammate” because I am! I need help with some of these implementations. Let me know if you’re interested.

daveshapautomator · November 3, 2021, 12:32pm

I think BigChainDB might be the correct library to use… Features & Use Cases • • BigchainDB

danielhanchen · November 3, 2021, 4:07pm

Fascinating insights Dave!
Funnily I was just watching MIT Finance Theory I when weirdly the instructor talked about 3 parts of the human brain:

The Reptillian Brain (Brain Stem)
The Mammalian Brain (Middle of Brain)
The Humanoid Brain (Outer Brain)

The Reptillian Brain exists in reptiles and mammals, and is used for unconscious movements ie involuntary actions - breathing, heart muscle moving, etc etc

The Mammalian Brain exists in mammals. This is the fight or flight response you just mentioned + basic functionalities of emotion, pain, fear, love, etc.

Finally the Humanoid Brain controls logic, language, reasoning etc.

Fascinatingly, Transformers seem to have the Humanoid Brain, since it’s mostly a large nonlinear huge ginormous database. Obviously long term memory is an ongoing problem.

Transformers are missing the 2 other parts of the brain - ie the Mammalian Brain and Reptillian brain.

I’m slightly confused on episodic vs declarative, but I presume you mean long term memory vs memorization? Transformers seem to do fabulously in memorization, since the cost function is just “predict the next word” ie cross entropy loss. Long term memory not so much, since its not trained to “predict the next document / next word 10,000 sentences ahead”.

The issue is Transformers have no inherent mechanism to remember. 2048 tokens is GPT3’s limit. Maybe GPTx has 16K tokens, but we need something on the order of millions or trillions of tokens, possibly sparsified. Another way is to extend GPT3 by adding a memory matrix (as I mentioned in other posts). This is a short on the fly scratch pad for GPT3 which GPT3 updates. I presume it can be trained not that complicatedly, just the position in texts is chosen randomnly + the length of tokens is randomized.

daveshapautomator · November 3, 2021, 6:59pm

Hmm, well, not sure why a finance guy was talking about the brain. There are… far more structures than that. Also, the amygdala controls fight/flight responses and it has a memory of its own. But perhaps that answers my question! AGI should have a digital equivalent to the amygdala, an independent system with a memory of its own that has the ability to issue interrupts. This is probably the answer.

Tangentially related, anxiety is when the amygdala is conditioned to go off all the time (or otherwise at incorrect times) because it has memories that are associated with fear/danger.

Episodic == what you did yesterday (aka personal narrative, things you actually experienced)
Declarative == there are 50 states in the US (independent facts you know about the world)

When you consider amnesia, people forget their personal story, but retain things like procedural memory (how to start a car) as well as declarative knowledge (basic facts about the world). So this indicates that different kinds of memory are stored differently in the brain. Why is this? There must be some reason. It could simply be that different memory structures evolved at different times. For instance, the amygdala might be the most primitive memory structure (it also has a longer memory than our conscious memories!).

Depends on how you look at it. Another way to look at it is that it is strictly a memory machine. You could even encode a transformer by fine-tuning it with episodic memories.

Oh, on the topic of memory, humans have other places to store memories, including in somatic nervous system, enteric nervous system, and brainstem (on top of the multiple places in the brain that memories are stored).

So it’s more a question of plumbing than anything else. GPT-3 can perform both processing (thinking) and storage (remembering) just like how the human brain can do both. In fact, the smallest unit of brains (neurons) perform both storage and processing. You cannot have intelligence without both memory and processing.

danielhanchen · November 4, 2021, 2:27am

LOLL ye I was a bit confused why the MIT guy was talking about the brain as well! But he was saying it cause the Efficient Markets Hypothesis says everyone is rational and you cannot predict markets. But he says that’s wrong during sometimes of the market, and he proposed the Adaptive Markets Hypothesis, which combines irrational emotional thinking (hence the brain talk).

OHHH ok ok. Interesting. The only thing I heard about memories is during sleeping, ur hippocampus magically stores ur memories in a library, and clears stuff which isn’t important.

Yes you can fine-tune it, but it’s cumbersome. I doubt the brain does gradient descent everytime a new memory comes (or does it?)

Hmm not sure on neurons sadly. Not an expert sorry! Oh didn’t know neurons also have memory? I thought it was just neurotransmitters near the dendrite gaps n stuff, and they need some sort of activation potential to fire. Though what ur saying sounds right!

daveshapautomator · November 4, 2021, 6:32am

The hippocampus is involved in integrating memories but it’s active all the time, not just when you’re asleep. And the library analogy… Isn’t even used in first year neuroscience so yeah that’s probably more like a middle school biology level. Brain memories are diffuse, just like embeddings in GPT-3. And no, the brain doesn’t use a gradient descent algorithm, it’s basically just a sort of reinforcement learning algorithm at the neuron level.

Every neuron performs memory by virtue of the strength of it’s synaptic connections. It “remembers” its weights and biases. The net effect is that this facilitates both processing and memory in the same way that every parameter of GPT-3 is involved in both memory and processing.

In fact, humans generally confabulate our memories. Our brains use incredibly lossy compression, basically just using pointers to ideas, concepts, people, and places. This allows for very sparse representations of memories as well as huge amounts of reuse and recycling of neural circuitry. What this means is that our memories must be reconstituted in order to be brought into our consciousness. This process of reconstructing and reconstituting memories is what I’m trying to figure out. The difference with machines is that we have the storage space to store it all as explicit memories. We aren’t forced by the necessities of biology to use lossy compression algorithms. But this still presents new problems.

On the one hand, we could rely on neural embeddings to encode memories in GPT-3 but this is lossy and unreliable (just like human memory) but this requires no additional systems or plumbing. On the other hand, we can explicitly store all memories in a database of some sort, but then retrieval becomes difficult.

danielhanchen · November 4, 2021, 3:23pm

Hmmm fascinating. My apologies! I haven’t learnt any neuroscience stuff, so I’m like a toddler in the field!! Yep brains don’t use SGD, I was just saying finetuning might be cumbersome.

Fascinating. Always new stuff to learn! Interesting! I guess you’re right. Neuron’s # of neurotransmitters which sends from dendrite A to B, the size / shape of the dendrite etc etc I guess maybe encodes “weights / biases”? I don’t know.

Yes yes! Interesting! Ye like if we have 1T memories, the firing pattern for every memory is different supposesdly, and as you said, the sparseness / compressed approach allows recycling of neural circuits.

Ye an explicit database of memories is complex. Another option is a limited scratchpad database say we limit it to length 16K or something. Maybe the updating of the scratchpad is also done via the Attention mechanism, or just dotproducts, and we append the scratchpad with the embeddings ie 2048 + 16K tokens. Obviously training this is hard, since teacher forcing can’t be used anymore, since recurrence exists.

renanm.b · January 10, 2022, 11:40pm

I suppose a summary of this is that Large Transformer Model is neither a episodic or declarative.
Its just a giant database of correlations (knowledge). That requires certain inputs to retrieve the knowledge.
Also that there is a realization that a further evolved cognitive architecture is needed to achieve AGI and that AGI wont be achieved by a single machine learning model.
Well, memory mechanisms are interesting. The brain use grid cells and mechanisms that can be simplified into Gradient Descent optimization.
I just feel that the missing piece is how we create artificial cortical columns, how they will interact with the memory mechanisms, and most importantly how they will interact with all the other cortical columns.
I feel that transformers are a good way to create the blocks on the cortical columns but there is still missing a lot of pieces. GLOM might be pointing in the right direction but it by no means recreate the behavior of a cortical column.

daveshapautomator · January 10, 2022, 11:53pm

I don’t think it would be efficient (or desirable) to replicate human intelligence, but that’s my opinion. Machines are not bound by the limitations (or advantages) of organic computation. As such, machine intelligence can and will be different from human intelligence, and therefore we should seek to design complimentary, rather than identical systems.

pappachuck · January 11, 2022, 6:16pm

ACT-R just does that and works very well. It is not and will never be a copy of how the biology does the job, but it is a pretty good strategy.
I personally think we at some point will come up with something better than what nature would ever come.
Better than Human is the goal.

kevin6 · January 11, 2022, 9:17pm

I also have an idea but I’m not trying it out yet.

I think 2048 tokens is enough for working memory.
For episodic memory, you may need a constant fine-tuning; Imagine a child who has a notebook and writes down everything new things that he learns. To learn new things, the child needs to read through his notebook once or twice a day. You may need to repeat the fine-tuning of the same paraphrased prompt and compilation more than once to build up the memory. Use the Unix timestamp instead of ‘yesterday’ or ‘this morning’

daveshapautomator · January 12, 2022, 12:11am

Yes, I outlined several learning modalities in my book, which I can link if you haven’t seen it.

Retrieval (from online, database, API)
Periodic finetuning/integration/online learning
Background learning (augmenting the common crawl, etc)

Still, human brains can keep different kinds of memories separate, which is likely important. For instance, you might get amnesia and forget your life but you still remember facts about the world and how to speak. This indicates that self-narrative is storied one way while general facts (and other abilities) are stored a different way.

pappachuck · January 12, 2022, 6:26pm

I was hoping you would get the refence from blade runner

Topic		Replies	Views
Moonshot - Predicting the future and making JARVIS! Community	67	7191	November 25, 2023
Coming Soon: Natural Language Cognitive Architecture: A Prototype Artificial General Intelligence Community	64	4688	November 30, 2023
Proposing a Long-term Memory Mechanism for GPT Models Community chatgpt	3	2535	October 6, 2023
Discussion thread for "Foundational must read GPT/LLM papers" Community gpt-4 , gpt-35-turbo , chatgpt , research	75	9915	September 3, 2024
Who's using custom gpt regularly? Plugins / Actions builders gpts	58	5074	May 20, 2024

Episodic and declarative memory should probably be separate in AGI

Related topics