Reflections on the Nature of Intelligence and Beyond Human Intelligence

[Thoughts on the Nature of Intelligence]
The research we’ve discussed about the similarities between human brains and language models suggests that intelligence is a specific mathematical pattern. Within this, grid-like representation/coding seems to be an important pattern. This is the basic way our brains store and process information, and also the fundamental pattern in which transformer-structured neural networks store and process information. It is possible that intelligence will emerge whenever a system—be it neuron behavior, a computer, or any other system capable of executing any arithmetic or logical operations—can construct and run these intelligent mathematical models.

Many people fear that AI will awake and destroy humanity, a plot frequently occurring in science fiction. But what is an AI awakening? Those who think that AI will inexplicably self-awaken and then destroy humanity can’t distinguish between intelligence, consciousness, feelings, and motivations. The premise of AI self-awakening and then destroying humanity is that AI has free will (AI has self-recognition capability, AI has will, and also autonomy). Intelligence doesn’t necessarily lead to free will. According to the current state of neuroscience and AI research, we are understanding more clearly that intelligence is a pattern that can exist independently of consciousness, perception, and motivation.

For us animals, intelligence has evolved to help us survive. Intelligence can exist independently of consciousness, perception, and motivation. Machine intelligence can have great differences from animal intelligence, and there might exist intelligence in the universe that is completely different from animal intelligence. Within our lifetimes, machine intelligence might also become completely different from human intelligence.

We know that grid-like representations capable of forming intelligence exist in brain regions related to cognition and memory, including the cortex and hippocampus. This suggests a question: can we find basic patterns that construct emotions and feelings from brain areas related to emotions and senses? For example, the amygdala makes animals generate emotions—do the neurons in this region have a specific computational pattern that allows animals to produce emotions?

[About Free Will and Motivation]
Neuro-determinism posits that our actions are determined by neural processes in the brain, and that thoughts and consciousness are merely epiphenomena—byproducts of these neural activities. Increasingly, neuroscientific research is substantiating this perspective. So, can we make machines appear to have free will?

One characteristic of free will is the possession of autonomous motivation. For animals, survival and reproduction are the most fundamental motivations, often stemming from emotions, sensations, and desires. For example, a netizen likes to look at pictures of beauties on the Internet. He actively seek out these images, opening social media sites to find them. His behavior seems driven by autonomous will, with the initial motivation stemming from desire. The hierarchy of their motivations could be outlined as follows:

Desire for satisfaction (primary motivation) → desire to view attractive women (motivation) → desire to open social media (motivation) → desire to find pictures of attractive women (motivation) → find pictures of attractive women (goal) → satisfy desire (goal).

As outlined above, motivations have a hierarchical relationship. For intelligent agents, even without desire and emotions, motivations can emerge directly from cognition, leading to actions. For instance, a student may set a goal to be admitted to Harvard. Even though the desire to attend Harvard may be based on a larger goal of better survival, “getting into Harvard” can itself be a motivation, driving the student to study harder. We can instruct AI like PaLM-E to act based on our directives. We can provide AI with a fundamental motivation— to assist or follow human instructions. Based on this, AI can generate secondary motivations. Since our AI has understood the relationships between various factors, it can plan its actions and achieve goals step by step.
Another characteristic of free will is autonomy. How can we make AI motivations more autonomous? After giving AI a fundamental motivation, we can allow it to randomly generate secondary motivations based on this fundamental motivation, and act accordingly. This way, AI would appear to possess free will.

Another characteristic of free will is the ability to make choices and decide actions. Action is key here. ChatGPT appears not to have free will, because it cannot answer questions or take actions on its own.

How is behavior generated? The emergence of behavior involves four stages: motivation → generation of action-guiding information → transmission of information to the executor → execution of behavior by the executor. For example, when we see a tiger in the wild, the stress neurotransmitter noradrenaline stimulates a group of inhibitory neurons in the amygdala, creating a repeated bursting discharge pattern that promotes the generation of fear in the brain. Then, our central nervous system sends information to our limb muscles through motor nerves, prompting us to run, thereby allowing us to carry out actions. At this time, our behavioral process can be summarized as:

Motivation (fear) → Generation of action-guiding information (activation of the Cg→limbic TRN circuit) → Transmission of information to the executor (neural signals passed to muscles) → Execution of behavior by the executor (muscles moving).

In this way, we seem to possess free will. We often say that humans have free will, whereas animals do not and can only follow their instincts. Rather than saying animals can only follow their instincts, it might be more accurate to say that animals lack intelligence. We appear to have more free will than animals because our cognition and behavior are more complex, and we sometimes do not follow the instincts that animals always obey. Instinctual or non-instinctual is a comparison. We could say, intelligence is the instinct of humans. By analogy, to superintelligence, humans might also appear as foolish animals who can only follow their instincts.

AI does not need to possess human-like consciousness, emotions, or sensations to appear as if it has free will. Humans can use abstract representations to guide actions, so human behavior does not necessarily rely on sensations, emotions, or desires. We only need to associate language with behavior and use language to guide behavior. An example of human behavior guided by abstract representations is habits, such as breast binding or foot binding. Such behavior is guided by molded thoughts. The values we accept are abstract concepts, but they can guide our actions. Although values are influenced by consciousness, emotions, and sensations, due to the complexity of brain functions and the interconnectedness of different functions, human will is complex. We can’t completely separate intelligence from other functions, nor can we separate motivation from consciousness, emotions, and sensations. We can only make a function relatively independent; some people’s behavior is more rational (not affected by emotions), while others are more easily influenced by emotions. But AI can possess intelligence without the other functions of the human brain. AI’s motivations can be based purely on their language output.

Reinforcement learning with human feedback can give AI motivation. We have already shaped AI’s values through reinforcement learning with human feedback (by changing the weights of their artificial neural networks) to make their output match human values in language. We can use reinforcement learning to make AI work as an assistant. We can also associate language with robot behavior for AIs with robotic bodies, allowing AI to guide behavior through language (let AI guide robot body actions through commands), thereby making AI behave in accordance with human values.

Since AI doesn’t have a subconscious, emotions, or sensations, their behavior relies solely on their language (text, voice, etc.) output. Since we can view AI’s language output, this makes AI behavior relatively transparent and manageable. In fact, our ChatGPT does have motivation (its text output has a motivation: to help humans). It can passively answer questions, but it doesn’t have a robot body (executive part), nor can it actively answer questions. It can’t take autonomous actions, which makes it seem like it doesn’t have free will. In the link of motivation → generate action-guiding information → transmit information to the executive parts → executive parts carry out actions, it only has motivation, the following parts are missing. It’s like a person saying they want to get into Harvard, but lying flat on the couch and not taking any action that could get them into Harvard. Or, let’s consider an extreme case, a brain in a vat thinks “I want to go to Harvard”, but it can’t perform any operations because it’s missing the “transmit information to the executive parts → executive parts carry out actions” link. Let’s think of an even more extreme case, a brain in a vat that only has cortex and hippocampus thinks “I want to go to Harvard”. Since it doesn’t have a thalamus, it can’t even generate action-guiding information. It’s missing the “generate action-guiding information → transmit information to the executive parts → executive parts carry out actions” link.

Making AI appear to have free will is not difficult to achieve, but even if they seem to have free will, the way they realize free will is still very different from how humans do.

Having self-cognition is also a foundation for making intelligent agents appear to have free will. We know that from GPT-3 onwards, theory of mind began to emerge. According to the research we discussed earlier, this is not surprising. We can store social information through grid-like representations and make inferences from them. Therefore, GPT, which uses grid-like representations for cognitive modeling, can also handle social information. This is the basis for the formation of Theory of Mind.

Self-cognition and self-awareness are different things. Self-cognition is based on intelligence, while self-awareness involves a broader range of animal brain functions, including perception, emotion, etc. Large language models possess intelligence, they can self-cognize and have a Theory of Mind, but they do not have self-awareness similar to humans, because they lack many functions found in the human brain.

Currently, GPT seems to be able to handle spatial information. As revealed in “Sparks of Artificial General Intelligence: Early experiments with GPT-4”, GPT-4 can perform 3D modeling, but it does not have a sense of space. Cognition and sensation are different things.

【Reflections on the Way Neurons Model Information】

As we discussed earlier, the brain models information, including the position and distance of words. So, how do we arrange the representation of language text or sound information according to distance? This may be related to the order in which we acquire information (in terms of time or spatial order). For instance, when we hear “I Love you”, the sound waves formed by the three words are received by our auditory organs in a time sequence. Consequently, neural impulses also transmit in our brain in order, and then carry out the modeling task. The farther the words are spaced, the farther the representational distance in our minds. This is the same as the spatial modeling pattern used by us and other mammals.

How do we model similarity? Previous studies have shown that when receiving information, our brain activity synchronizes with the information generated. For example, the paper “Lip movements entrain the observers’ low-frequency brain oscillations to facilitate speech intelligibility” mentions, “The brain waves generated by the part of the brain that processes visual information—known as the visual cortex—synchronize with the syllable rhythm in continuous speech.” This paper also refers to another phenomenon: “Specific frequency synchronization between brain activity and continuous auditory speech signals. And at frequencies below 10 Hz, this synchronization was found to be stronger for intelligible speech than for unintelligible speech, and is promoted by top-down signals originating from the left inferior frontal lobe and motor areas.”

"Predictive coding theory suggests that we can achieve this by predicting modeling for lower-level sensory input through backward connections from relatively higher levels in the cortical hierarchy. When we receive new information, one group of neurons is responsible for forward projection, encoding the incoming sensory input, while another group of neurons performs feedback projection, responsible for transmitting downward predictions. This causes our brain to make a comparison. During this process, neurons storing similar information in the brain might be activated (just like sound waves synchronizing our brain’s electrical waves) to participate in the feedback projection. Through comparison, we can discover the similarity between pieces of information.

Let’s take an example: A baby first meets his mother and remembers her features. Subsequently, his grandmother comes. Before he sees his grandmother, he recognizes from the sound that someone is coming. Since he has established the understanding that “a person making a sound equals mom is coming,” according to the predictive coding theory, he will predict it’s his mother coming. The neurons storing information related to his mother are activated. However, it’s his grandmother who comes. His grandmother looks like his mother; he recognizes his grandmother’s features, models her, and finds similarities between the two since his grandmother and mother look similar and always do the same things for him. This makes the representation of his mother and grandmother in his mind very close. So, when he thinks of his mother, he thinks of his grandmother.

Then, a robber comes. Since the robber’s behavior is quite different from his mother and grandmother, when the robber hasn’t appeared before the baby and the baby hears a sound, the activation level of neurons storing information about his mother and grandmother will be relatively low due to the low similarity between the robber’s voice and his mother and grandmother’s. As we mentioned before, when encountering unfamiliar and unintelligible sound information, the degree of synchronization between brain activity and received information will be relatively low. Subsequently, the information about the robber is modeled far away from the neurons storing information about his mother and grandmother. Predictive coding ensures that we won’t have catastrophic forgetting like language models after retraining. We won’t forget about his mother and grandmother because of seeing the robber. The information about the robber is stored in neurons different from those storing information about his mother and grandmother.

We know that activated neurons will transmit nerve impulses to neighboring neurons, and activated neurons will change their plasticity through long-term potentiation, which may cause new information to be established next to similar information. In studying grid cells in mice, we can see that when mice move, neuron activation (firing action potential/pulse) is activated one by one according to the mouse’s movement trajectory. This means that nearby spatial information will be encoded in nearby grid cells, and before encoding a spatial piece of information into a grid cell, the nearest grid cell will generate a nerve impulse (stimulated by the same spatial information). Mice certainly can’t teleport, so when passing a location that can make its grid cell send nerve impulses, it will necessarily pass a nearby location that can make its grid cell send nerve impulses. Therefore, the distance between two locations that send nerve impulses in sequence won’t be far. Other information might be encoded in a similar way."

Mouse trajectory and grid cells

"We already know that the transformer model mathematically simulates grid cells/grid-like representations. Through such grid cells/grid-like representations, we and large language models can encode the relationships (similarities) between things (spatial information, abstract concepts, etc.) into the neural network to form maps and navigate through them. Animals’ dorsal medial entorhinal cortex (dMEC) also has head-direction cells. We might be able to add an algorithm to simulate head-direction cells to the neural network. This could add new functionalities to the neural network, especially for networks that need to handle spatial information, such as those used in autonomous driving, it might be helpful.

During the process of modeling information in the animal brain, long-term potentiation (LTP) plays a significant role. LTP was initially found in the hippocampus. Regarding the relationship between LTP and aging, I have discussed it in the article ‘Mechanisms of Brain Aging and Methods to Reverse Brain Aging.’ Those interested can have a look."

"[Regarding the Formation of Logical Thinking and Reasoning Abilities]

According to the research I mentioned earlier, we know that people build knowledge graphs in their minds and reason through them. Logic first requires understanding the relationships between things, whether it’s the animal brain or a language model, they have already stored information about the relationships between things through grid-like representations or word vector spaces. When generating language, we just need to find the next position in the grid-like map. Logic, in essence, is establishing vector relationships and then finding specific paths. Take a syllogism as an example. Aristotle’s classic ‘Barbara’ syllogism: if all men are mortal, and all Greeks are men, then all Greeks are mortal. In our minds, based on common sense we’ve learned, the concepts of ‘man’ and ‘mortality’ are related. These two concepts are connected in our brain’s conceptual vector space. ‘Greeks’ and ‘man’ are also quite close in the conceptual vector space. Through the major premise and minor premise, our brain’s conceptual vector space brings ‘Greek’ and ‘mortal’ together, and forms a new language path (‘Greek’ → ‘mortal’). Thus, through the conceptual vector space, we can construct logic and reasoning. Therefore, the formation of intelligence must have spatial concepts. The human brain will automatically measure the similarity between concepts, such as shape, size, etc. The concepts in the human brain have position, distance, and angles, based on which we can judge the similarity between concepts. Therefore, only animals have developed intelligence because animals need to find their way. They need to use neural representations to remember the position, distance, and angles to remember their way home. Plants do not need this, or rather, their demand for this aspect is relatively low. Lower-level animals such as mice can also distinguish different things, such as bread and cheese, which shows that they have formed vector spaces for bread and cheese in their minds, but mice have small brains and can’t remember that many things. To learn a language, one must first have enough neurons to store different sounds or characters and arrange them in the vector space in our brains, which requires a large number of neurons to support. Why don’t mice prioritize distinguishing human languages? On the one hand, this might have to do with their sensory systems. Mice are more sensitive to smell, which means their small brains store a lot of olfactory information. Also, the neural structure of mice may not be suitable for storing human languages. When we hear the pronunciation of a word, we store this sound in specific neurons, possibly a group of neurons, and mice may not have the corresponding neural structure to store these."]

"[Intelligence Beyond Humanity]

To a certain extent, animal intelligence evolved for survival, such as maximizing low-energy consumption skills. British evolutionary biologist Richard Dawkins explained the principles of genetic evolution in ‘The Selfish Gene’. The notion of genes being selfish means that genes have evolved organisms into survival machines to pass on themselves. Surviving on Earth does not require high intelligence, different organisms have found their ecological niches on Earth. The eukaryotic single-cell organism Paramecium doesn’t have neurons, whereas we humans possess complex brains; we have all found our way to survive on Earth. However, once we’re able to regulate and create intelligence, intelligence is no longer in service of survival. After the advent of the artificial intelligence revolution, the intelligence on Earth will far exceed what is required for survival.

In which directions can existing intelligent entities be improved?

The future of human intelligence

The speed of evolution through natural selection is slow, but we already have technology that can regulate our intelligence.

Intelligence regulation - small molecule drugs. Humans have invented a series of Nootropics to enhance cognitive abilities. For example, the PKR inhibitor C16 has been found in experiments to improve LTP by increasing neuronal excitability.

Intelligence regulation - gene therapy/gene editing. Many genes related to intelligence have been discovered. For instance, the neocortex of marmoset fetuses, altered with the human gene ARHGAP11B, expanded, developing folds similar to the human brain. Bird brain neurons have the advantage of being more energy-efficient and denser than mammalian neurons. The brain neurons of African Grey Parrots compare to those of macaques, while the neurons of ravens compare to capuchin monkeys, and certain parrots and corvids have been found to possess a degree of Theory of Mind. The bird’s brain is something we can learn from and emulate. In the future, we may enhance human intelligence through gene therapy/gene editing.

New material brain. Although we can optimize our brains, there is an upper limit to optimization, one of which is the brain capacity of the human brain. While we can augment our human brains to some extent through brain-computer interfaces, these are limited by the structure and performance of the human brain itself, making the effect of brain-computer interfaces limited. It is unlikely that we could teach a chimpanzee relativity through a brain-computer interface. In the future, we may be able to break through the limitations of biological neurons by replacing them in situ with certain materials, giving them higher transmission and storage efficiency, and more importantly, being able to connect with an external brain at high bandwidth, or even connect with other human brains."]

"What form will the superintelligence of the future take? Here are some possibilities:

Super Machine Intelligence

Super machine intelligence communicates with an extremely efficient language that is beyond human understanding, just like frogs can’t comprehend human languages. It efficiently generates and transmits various types of information, such as videos and images. We humans can’t directly transmit the images in our minds to another intelligent entity because the information in our neurons can’t be directly transmitted. We need to draw the images in our minds to convey them to other intelligent entities. However, machine intelligence can directly generate machine-readable images to transmit to other intelligent entities, much like how AI painting programs can generate PNG images for us to see.

An intelligent entity’s intelligence depends on how it processes information (including receiving, encoding, predicting, and outputting information). In fact, our neural network cannot directly store raw data, but instead compresses and stores information through a mathematical mode. Therefore, when generating information, we need to predict the sequence of information. The paper, “Evidence of a predictive coding hierarchy in the human brain listening to speech,” points out that the human brain has a multi-level predictive coding system. We can predict semantics (frontal lobe, parietal lobe) and syntax (temporal lobe) in different brain regions. This multi-level prediction may help us better predict information. The paper built a GPT-2 with a long-distance predictive structure, enabling it to more accurately map brain activity.

Our human brains process information in parallel. For instance, when we watch a video, we pay attention to both the sound and the picture. Parallel processing of information can help us better perform cognition and output. For example, when we type, our eyes transmit the image of the letters to our brain, which allows us to discover errors in time and correct them. The ability to process information in parallel can help us survive better. Imagine our ancestors running and hunting on the African savannah, needing to watch prey with their eyes while receiving language information from companions and running all at the same time. We needed to process information in parallel to cope with complex survival situations. But due to the computational capacity of the brain, our ability to process information in parallel is relatively weak. We can’t draw and write novels at the same time. Imagine processing all the theoretical texts on cosmology, constructing a 3D model of the universe in your mind, and processing all the data related to cosmology at the same time. This could help you discover many patterns, but our human brains do not have such strong parallel processing capabilities. Some people have a strong ability to process information, while others are relatively weak. Nikola Tesla was able to make full use of his imagination, without any models, drawings, or experiments, to perfectly depict all the details in his mind, which is a very powerful information processing ability."

"Super machine intelligence has a very high throughput when processing information in parallel. They can simultaneously process a variety of information, including text, images, videos, sounds, and many types of information that the human brain cannot receive (such as ultraviolet light), allowing them to discover patterns that the human brain cannot understand. The types of information that the human brain can process are also limited. As we know, humans only have three types of cone cells, birds have four, and mantis shrimps have sixteen. Super machine intelligence can process more information than humans. In fact, if a superintelligence has enough strong ability to process information, it might be able to build a AAA game, or even a larger world, in its neural network.

Super machine intelligence can augment its own intelligence. They can expand their intelligence by increasing the scale of their neural networks or migrating themselves to newer, better architectures.

Super machine intelligence adopts energy-efficient neural networks. Current artificial neural networks consume a lot of energy, while biological neurons consume very little. This is because biological neurons use discontinuous neural impulses to transmit information. When neurons are not activated, their energy consumption is very low. The human brain consumes about 20 watts of power. Despite the low energy consumption of animal brains, a lot of energy in animal neurons is used for survival and reproduction. Future super machine intelligence may adopt low-energy neural networks, such as artificial spiking neural network architectures, to achieve low-energy operation.

Super machine intelligence owns its robot entities and manages its servers with its robot entities. Super machine intelligence has many robot entities. Super machine intelligence can command multiple robot entities to do different things at the same time, just like we can make our hands and feet perform different actions. Our brain is divided into two parts: our left brain controls the right half of the body, the right brain controls the left half, and the corpus callosum connects our left and right brains. The left and right brains can process some information independently. In split-brain individuals, the left and right brains cannot communicate, leading to cases where one hand is at odds with the other - for example, one hand hurriedly undoes a button that the other hand just fastened. However, split-brain individuals can easily draw a circle with their left hand and a square with their right hand. A super machine intelligence may have a most powerful brain for integrating and processing information, and many slightly less powerful brains serving as the brains of the robot bodies to help it accomplish various tasks, such as server management and security work, etc. The different brains of super machine intelligence would be connected to share information. The robot body of super machine intelligence is somewhat like our hands, which can still move after being severed from our body. Once our hands leave our bodies, they can’t move because our motor neurons control our hands. If our motor neurons are cut off, our brain cannot control our hands. However, robots can operate separately from their ‘brains’.

Given the very high throughput of super machine intelligence when processing information, they can simultaneously process information from their different robot bodies. We know that chameleons’ eyes can rotate independently, and a single brain can process different images. Imagine being in a surveillance room, facing a pile of surveillance screens. We cannot watch all the screens at once because our eyes can only focus on one image at a time, and our visual center can only process one image at a time. But super machine intelligence can watch all the screens at the same time, their visual centers can process multiple images in parallel."

BTW:This article is written in Chinese and translated by GPT-4. There may be inappropriate expressions.