First, I apologize. This may be off topic. It would be nice to have a forum to discuss theoretical issues.
This will be a logical argument based on the premise of the universal approximation theorem–the theorem that proves that a neural network with one or more hidden layer can approximate any mathematical equation. Since this is true, it would follow that a neural network of sufficient depth and number of parameters could approximate the outputs of all neural networks. Therefore, it would be possible to train a GPT-X network on the inputs and outputs of many existing NNs. This could be structured with natural language including image content encoded in a way that the network can accept (and also predict).
For example:
This is an image of a cat [imagedata(128x128)=010101110…].
Then imagine a playground where you can type:
This is an image of a cat:
And the AI generates the image of a cat.
Are there problems with this formulation, or do you think it might work?
More or less with the single example I gave. I envision this encompassing the full range of architectures (RL, GAN, classification, CNN, etc…) in one large network.
Few or zero shot learning across the range of tasks that current networks have been trained on. Fine-tuning with even less data. Interesting possibilities for generalization and generation of custom networks by the AI system.
Thank you for the feedback. I think the points you raise are all important issues. As to how to quickly/efficiently train the parameters, there would need to be experimentation there. Some way of incorporating the model directly or executing it for results would be interesting. You could maybe imagine Codex being able to generate and execute generated code (even NN’s) to give responses. I agree that it might not work to try to retrain everything. The existing network architecture and weights would need to be ingested or utilized through direct execution. Direct execution would seem more feasible and would open the door for such modules to be refined without touching the GPT-X model.
GPT-X will never achieve AGI for simple reasons that take quite a few pages to fully convey the explanation. So, I separated a few places where you can find the information.
The premise of the universal approximation theorem is guess, it is by no means a proof. There are also lots of material about this topic, very interesting mathematical tool, but it is not by itself the solution for AGI.
AGI is rather a problem of architecture of NNs than of a single powerful NN.
I would have agreed with you up until the last month or so. I believed cognitive architecture was necessary. After seeing what GPT-3 can do, and interacting with it personally, I feel much less convinced of that. That said, I think it will produce a very different kind of intelligence. It will likely be able to do nearly everything that humans can, but only when given the proper framing and instructions. There have been an increasing number of domains in which the GPT series has been able to exceed human capacity. Is there a reason to think that will stop?
I have watched many of Numenta’s videos and read Hawkins’ 1000 Brains book. I think they have made important contributions, but also possibly deeply misleading about the nature of intelligence. For example, they entirely discount sub-cortical contributions to intelligence based on an outdated theory. Specifically, dividing the brain into a “less evolved” reptilian brain and more evolved mammalian cortex is a disproven hypothesis not supported by neuroscience. I think human intelligence crucially involves “older” and “newer” regions of the brain. I suspect you agree with that to an extent if you see cognitive architecture as important.
That said, apart from the serious problems with the Numenta model, I do believe that large scale mental models are quite important. The grid-type cells hypothesized throughout the cortex and cortical columns also make sense. They do seem to be throwing out a lot of research pointing to hierarchical processing, or maybe I’m just unfamiliar with how they try to account for previous research on that subject (consider all of the research looking at neuron recordings in the visual cortex that are activated in very specific ways for feature detection and increasing levels of abstraction). I interacted with them on the subject a year or so ago and did not receive a response I found convincing.
I have tried to take what I thought was good in the Numenta model to build a working theory of human-level intelligence if you are interested. I think mental models are the material upon which intelligence operates rather than reflecting intelligence per se.
Defining Human Level Intelligence: Operating on Mental Models
I think I need to format my thoughts into a literature review. This subject is not easy to debate over this forum. I wrote some loose thoughts here, but to make it productive would be best if we could write a formal review with all the references.
Geoffrey Hinton, has a very nice solution for the problems you mentioned. The best part is that some of it works.
Michael Graziano has a nice theory about attention mechanisms.
Also, I think the divergences start in the definitions of intelligence. I personally agree that intelligence is how we learn and store information about the world. The brain seems to have lots of general sensory-motor structures that does this hierarchical parallel processing.
Numenta not always score, it is part of the process but what I understood so far is that the brain division they use is an abstraction presented for the less literate reader. Because, there are citations on the end of the book about it.
I am happy with Numenta work because their abstraction led me to better understand GLOM and some other interesting hypothesized cognitive models.
I personally think Self-Attention based architectures put us one step closer to AGI, now we need to figure out the rest of the architecture. Hinton’s work make a lot of sense.
GPT-X will never achieve AGI alone by sheer virtue of the fact that it has no episodic or declarative memory. The “framing and instructions” you mention are the Cognitive Architecture that is required. In fact, I wrote a whole book about it. David K Shapiro - NLCA
Thank you for letting me know about your book. It looks very interesting. I will read it.
I think you make some good points about the need for episodic and declarative memory. Although I would point out that humans are still capable of at least limited forms of intelligent processing with a non-functional episodic memory (e.g., Alzheimer’s disease). They often have more distant episodic/declarative memory, but can’t form new ones. Without episodic memory, an advanced GPT-X would be like a person with a neurological condition. I think that actually describes GPT-3 pretty well.
I agree that Hinton’s GLOM model is quite promising for building up the complex models for intelligence to operate on. I like the fact that it seems similar to cortical columns, which I also immediately thought of Numenta the first time I encountered it. I don’t think it gets you to AGI. You need language and an array of other functions, which I think you probably agree with as you cite for example, attention mechanisms. This is one of many that are needed. If we are wanting to replicate intelligence that is very human-like, this is a positive step. Again, I think we can get probably all functions that humans can perform out of something like GPT. Right now, you can fine tune it to do things that humans do well, but it doesn’t (e.g., it performs very poorly at SAT type analogies as is). It will soon be used for many high level processes that only humans could do before. It feels to me like a breakthrough on the order of the Internet.
Maybe our basic disagreement, if one exists, is about how to get to human-like intelligence. If we are trying to get to something that acts and thinks much like a human, then I agree completely that cognitive architecture is probably the way to go. I have explored a number of the existing ones and find them interesting. One concept I found potentially very important in my exploration of cognitive architecture is competitive queuing (this is based on the basal ganglia sub-cortical structures in the brain). I think this may be important for thinking and acting in the world. Whereas I think in the near-term that cognitive architecture might be the first to get to a human-like intelligence, I think a model with sufficient parameters (think an order of magnitude more than the number of synapses in the human brain) and the right training, might be able to simulate a human quite well—even with a very dissimilar architecture. It may even be possible to have many less parameters than the brain has synapses and exhibit as much or greater intelligence.
I’m not sure I completely agree about Numenta using the reptilian/mammalian brain division as being something they use just as an explanatory tool. Hawkins states many times that he believes intelligence rests entirely in the cortex as I recall from several interviews, videos, and his own book. It seems fundamental to his theory. I’ll add another objection to his theory: it is a potentially very misleading assumption that all cortical columns are performing the same function. This feels more like wishful thinking rather than being something clearly true. It is a hypothesis worth testing rather than assuming it is true. It is obviously great if true. He also discounts the value of emotions based on the incorrect model that this represents a more primitive process (like it is a step backwards somehow). I may be wrong, but my intuition is that we may ultimately want AI to have emotions if we are to connect with it (which is what many will want). One way to think about emotions is that they are a measure of important problems or inconsistencies (think fear, anger, sadness) or conversely important consistencies within the model (think joy, happiness). If we want unmotivated systems, then leave out the emotions. I don’t want to be too hard on Hawkins, because I have learned a lot from him and his team. He has also spent decades of time trying to solve intelligence and paid people out of his own pocket on his team as far as I can tell. He deserves some kind of award (Turing award?).
I appreciate your thoughts. It would be good to have a review and synthesis of these ideas.
Just this small interaction taught me so much. I Believe that we need a synthesis of these ideas and more thought process. It is so challenging already going from theory to code, imagine going from thought to code into the wrong direction.
I agree that it would be good to have this systematically integrated and to begin trying to tie it in with actual programming. There are a few more concepts I’d like to add to the discussion, which I think are necessary for human-like intelligence.
I want to add one very positive thing that I think comes from the Numenta research. The one thing that they find in their examination of cortical columns is that each column has a motor output, which suggests that the whole cortex contributes to the processing of future actions. The one thing almost completely lacking in architectures like GPT is a focus on motor output. You could consider the output tokens and associated text as being a motor output. I think it would be good to give more design consideration to the types of motor outputs that these systems have. This is also important from a safety perspective (accessing the web for updated or specific information could be a motor action where safety should be considered). An advanced Codex may execute its own generated code and predict the output. Human-like intelligence would involve the ability to generate, store, modify, transmit, predict, and execute programs (many of these functions are enabled by language).
If you consider humans, or other advanced organisms, their motor output has an association with, and effect on, sensory input. In other words, you can change your processing errors by moving your sensors or taking other motor actions. Thinking might be a very complex motor output that involves a dissociation, or partitioning function, of cognitive processing and motor output (see Procedural Behavioral Models in my theory). Consider the theory, that has some support, that states that we learn to think by first verbally imitating other people, and then later being able to do this internally without executing the full motor program (saying something out loud). I consider partitioning (or dissociation) to be a key feature and function of intelligence. You could think of this partitioning as a disconnection between cognitive processing and motor output or a form of cognitive processing that is disconnected from other cognitive processes that may typically be associated with the same process. Consider a dog that solves a problem of figuring out how to go around an obstruction to achieve a goal. There must be processing that is disconnected from the motor outputs which allows for the solving of the problem (in some type of mental simulation). Any planned action requires a partition between the cognitive processing and motor outputs. Any time you learn something new, which conflicts with an old pattern, there must be a partition between the new pattern and old pattern. On a neural level, the new pattern does not immediately disappear, but is partitioned off from the current pattern (it continues to compete with the new pattern). Anyway, there is neuroscience research to back that up. A relevant neuroscience concept is gating effects in the frontal lobes of the brain (look up gating effects and go-no go tasks).
I describe that phenomenon extensively in my book, I call it the “inner loop”.
Anyways, the brain (and proto-brains) evolved as a way to (1) interpret sensory information and (2) coordinate motor functions. Be sure you don’t put the cart before the horse. Cognition, thought, planning, and communication all came about in service to those original functions. And those 2 original functions are strictly in service to one thing: survival.
Anyways, no, language is not required to think. Read some Steven Pinker there are plenty of humans who have no inner monolog and they are equal to other humans in problem solving and reasoning. However, I do make the assertion that Natural Language is a “good enough” medium for thought, as opposed to inarticulate “mentalese”.
Thanks for your thoughts. I am aware of the fact that a percentage of people claim to have no inner voice. At first glance, I am unconvinced that this is not just an issue of partitioning of awareness rather than thinking without language. I just don’t believe it would be equivalent, but am open to change my mind. That said, I do think it is a good point to work to categorize the types of thought requiring language vs. not. I do believe there are many other mental functions that do not use language. A more precise definition of thought would be good.
You know, it is quite strange that the entire time I have practiced as a psychologist, I have never encountered anyone who doesn’t have an inner voice. I don’t want to discount it yet, because being unable to imagine something doesn’t mean it doesn’t exist. Yet, it seems likely that some tasks require inner thinking in language.
Here is what one person says about realizing that others have an inner voice.
“When I hear that other people have like a constant kind of dialogue and stream in their head and that when they’re doing a task they’ll just be thinking about things the entire time they’re doing a task, it actually kind of feels a little overwhelming,” she said. “How do you deal with that and what does that feel like?”
Those are some sentences that are put together quite well and it implies a lot of linguistic processing to come to those conclusions. It is clear from the words she uses that she is focusing on the feelings rather than meanings with words like “overwhelming” and “what does that feel like?” I’m not convinced yet, because I would need an explanation for how they read, listen to other people talking, form verbal concepts and associations and so forth. Anyway, maybe that is going down an unnecessary path as it won’t probably be possible to solve all controversies.
I have been reading your book and am finding it to be educational. It seems like a solid approach so far. I can tell that you put an enormous amount of work and thought into this project. Maybe I can help in some way once I understand more. I like the inner loop and how you conceive of thinking.
As to choosing what to think about, you note that we are more likely to think about novel or neglected things. It is good that you bring dreaming into the mix too. I think dreaming is data augmentation on steroids (among other things). If you haven’t already, you could further refine the architecture to encompass a task-negative and task-positive mode networks. In the human brain, we tend to think about ourselves or others when we are not engaged in a goal directed task (task-negative network). This could be an additional refinement to the novel and neglected elements (novel and neglected elements relating to self and others). When engaged in a task, the task-negative network becomes inactive and the task-positive network activates. Sorry if you have already thought of this.
There are quite a few open items for task specific cognition. Humans, for instance, can instantly prioritize task switching. NLCA today cannot. But I think my foundational work provides a good foundation for others to iterate on.
One idea I had was to have multiple inner loops working in parallel, each one attending to different aspects. This would soon give rise to truly alien intelligence, something that can think like us, but with wholly novel abilities.
Great ideas. I think GPT is already a kind of alien intelligence and eliciting the intelligence is wholly dependent on how you prompt it. Which I think is how you implement NLCA. Very smart. Do you have a working chat bot based on NLCA or some other implementation?