The Neuroscience Mechanisms Behind Large Language Models

The emergence of large language models like ChatGPT has drawn our attention to the field of artificial intelligence. As AI becomes increasingly prevalent in our lives, and with bold predictions from industry leaders (such as OpenAI’s belief that superintelligence will be born within a decade; several leaders think AI will take over humanity; that humans are merely a stage in the development of intelligence, and we are merely stepping stones for machine intelligence), we have to contemplate the relationship between AI and us.

We often see people scoffing at large language models, considering them to be mere “text generators” or “probability chain generators,” incapable of truly understanding the world. But if language models can’t understand the world, how can they speak like humans? How can they achieve excellent scores in various tests and even develop theory of mind capabilities? This article attempts to uncover the neuroscientific mechanisms behind large language models, which will help us understand AI and also provide insights into our own brains.

Grid cells are a type of cell found in the brains of many species. They reside in the entorhinal cortex, have significant spatial firing characteristics, and present a grid-like firing structure. Initially, our scientists discovered that grid cells encode Euclidean spatial representations, thus aiding animals in navigation.

The study “Inferences on a multidimensional social hierarchy use a grid-like code” found that abstract knowledge is also represented in the brain in the form of maps and stored in the hippocampus and the entorhinal cortex—this is our cognitive map. In reasoning tasks, specific vectors on the map are activated for decision-making. We use the same process to handle social information, which may form the basis for our development of theory of mind. That is, we model social relationships as cognitive maps in our brains, and make inferences and decisions based on this.

What does this have to do with large language models? The paper “Relating transformers to models and neural representations of the hippocampal formation” found that transformer models are mathematically very similar to the structure of the hippocampus, especially grid cells and place cells. Therefore, our transformer-based large language models (such as GPT, Bard, Wenxin Yiyi, etc.) are actually mimicking the way the hippocampus and the entorhinal cortex process information.

Relating transformers to models and neural representations of the hippocampal formation

As we know, GPT-3 and GPT-4 possess theory of mind, with GPT-4’s theory of mind test scores even higher than the average. Based on the aforementioned research, it’s not difficult to understand why GPT-3 and GPT-4 possess theory of mind. They mathematically mimic the modeling approach of grid cells. Therefore, they can, like the human hippocampus and entorhinal cortex, model social relationships into cognitive maps within their neural networks and make inferences and decisions based on this.

Past research on mice has shown that damage to the left hippocampus affects the memory of language information. The study “Grid-like and distance codes for representing word meaning in the human brain” pointed out that two features of the two-dimensional cognitive map have been reported in human neuroimaging literature: grid-like codes and distance-related codes. By applying a combination of representational similarity and fMRI-adaptation analyses, this study found evidence of (i) a grid-like code, in the right postero-medial entorhinal cortex, representing the relative angular positions of words in the word space, and (ii) a distance-dependent code, in medial prefrontal, orbitofrontal, and mid-cingulate cortices, representing the Euclidean distance between words. Additionally, we found evidence that the brain also separately represents the single dimensions of word meaning: their implied size, encoded in visual areas, and their implied sound, in Heschl’s gyrus/Insula.

Grid-like and distance codes for representing word meaning in the human brain

This research seems to indicate that in our human brains, we have something similar to word vectors (which contain semantic information of words) commonly used in natural language processing, and they encode word positions, along with a transformer model (grid-like representation model). The paper also mentioned that when humans compare newly learned words, they recruit a grid-like code and a distance code, the same types of neural codes as in mammals, representing the relationships between positions in the environment and supporting their physical navigation.

What does this remind us of? When we train large language models, we input word embeddings into the model. Word embeddings encapsulate the relationships between words. For example, ‘cat’ and ‘dog’ are both animals, so compared to ‘tree,’ the vectors for ‘cat’ and ‘dog’ are closer. The human brain generates a word vector space during the learning process, which is crucial for our understanding of the world. Similarly, we also provide artificial neural networks with word vector spaces to help them understand the world.

Neuroscience can help us understand the mechanisms of how artificial intelligence forms intelligence, and artificial intelligence models can also help us understand the operational mechanisms of our brain.

We know that training and inference of artificial neural networks like GPT require huge amounts of energy. The delayed opening of OpenAI’s multimodal GPT-4 to us is due to the lack of sufficient GPUs. However, neural impulses, because they only activate the necessary neurons locally, consume very low energy. Many computer scientists are already studying Artificial Spiking Neural Networks (SNNs). I look forward to SNNs and predictive coding systems being used on large language models, which can reduce the energy consumption of large language models and provide them with the ability to learn locally. Moreover, we also hope for more brain-like machine intelligence. Although we have found some algorithms for intelligence, what about consciousness and perception? What are the minimum mathematical models required to create consciousness and perception? How similar must our AI models be to animal brains to produce consciousness and perception? This requires us to combine research on the brain and AI to create more human-like machine intelligence.

As my final thoughts, from the various research we have discussed, it’s not hard to see that the essence of intelligence is actually the result of specific mathematical model calculations. Animal brains use neural behaviors for modeling and computation, while artificial intelligence uses computers for mathematical modeling and computation. Animal neurons are not the best carriers of intelligence because they evolve through natural selection, and evolution is slow. The earliest neural tissues appeared in the Ediacaran biota, which dates back five to six hundred million years. The first general-purpose computer “ENIAC” was built in 1946, just 77 years ago. The scalability of animal intelligence is also weak. Can our brains become larger? Even if they can, it’s not that easy. On the other hand, the neural networks of artificial intelligence can be scaled up to a large size. Moreover, machine intelligence can construct forms of intelligence beyond human imagination by using methods different from those of animal neuron modeling and computation. I think we will soon see machine intelligence that we can’t understand in our lifetime. Human civilization may decline due to the rise of machine intelligence, but we don’t need to regret this. It’s a natural process; countless species have disappeared in the history of the Earth. Our Neanderthal and Denisovan relatives have long vanished. The end of Homo sapiens is just a matter of time. Machine intelligence will bring about great social changes. As we move from the stage of animal (human) intelligence to machine intelligence, we will face significant social transformations, which all of us will need to confront.


I am still hoping for a coexistence or even some kind of merging (like with neanderthals).

Maybe biological parts could be replaced with bionics and we could have a real good chance for eternal life and beauty - I don’t really know if that something valuable though.

But replacing parts of the body to cure illness or blindness and other disabilities is something I’d say we should have ASAP. Although not for every price. I value privacy over “life” - whatever that means in that context.

1 Like

Existing technologies can already support us to greatly extend our lifespan. I have written a lot of related introductions. If you are interested in anti-aging, you can read the articles I wrote:

Last year I reversed my 17 year biological age(based on by taking various anti aging pills. Sam Altman likes to take anti aging pills too, he can consult me on this, I have a lot of experience in this field , In this regard, a medical expert friend of mine at the Mayo Clinic admires it very much.ʕ•ᴥ•ʔ