Thanks for indulging me. I will note, however, that you were the first one to veer from operational computer science into philosophy when you made the claim that ChatGPT does not understand anything.
You keep saying that it is “only” predicting the next token. I could similarly say that human speech is “only” sound waves. What happens under the covers to achieve the sound wave or the prediction is what we are discussing.
I want to quote OpenAI’s Chief Technology Officer to discuss the relationship between prediction and understanding:
SPENCER: So here with GPT-3, we’re trying to get it to predict the next word. What’s the connection between predicting and understanding?
ILYA: There is an intuitive argument that can be made, that if you have some system, which can guess what comes next in text (or maybe in some other modality) really, really well, then in order to do so, it must have a real degree of understanding. Here is an example which I think is convincing here. So let’s suppose that you have consumed a mystery novel, and you are at the last page of the novel. And somewhere on the last page, there is a sentence where the detective is about to announce the identity of whoever committed the crime. And then there is this one word, which is the name of whoever did it. At that point, the system will make a guess of the next word. If the system is really, really good, it will have a good guess about that name; it might narrow it down to three choices or two choices. And if the neural network has paid really close attention (well, certainly that’s how it works for people), if you pay really close attention in the mystery novel, and you think about it a lot, you can guess who did it at the end. So this suggests that if a neural network could do a really good job of predicting the next word, including this word, then it would suggest that it’s understood something very significant about the novel. Like, you cannot guess what the detective will say at the end of the book without really going deep into the meaning of the novel. And this is the link between prediction and understanding, or at least this is an intuitive link between those two.
SPENCER: Right. So the better you understand the whole mystery novel, the more ability you have to predict the next word. So essentially understanding and prediction are sort of two sides of the same coin.
ILYA: That’s right, with one important caveat, or rather, there is one note here. Understanding is a bit of a nebulous concept like, what does it mean if the system understands one concept or doesn’t? It’s a bit hard to answer that question. But it is very easy to measure whether a neural network correctly guesses the next word in some large corpus of text. So while you have this nebulous concept that you care about, you don’t have necessarily a direct handle on it; you have a very direct handle on this other concept of how well is your neural network predicting text and you can do things to improve this metric.
SPENCER: So it sort of operationalizes understanding in a way that we can actually optimize for.
Now you have made a few claims which I would like you to back up with quotes or references. If they are true, I would like to, um, understand better:
Neuroscientists have a strong understanding of understanding, and can measure it in ways more direct than by administering quizzes/puzzles.
Understanding is known to be directly linked to consciousness and is not orthogonal. (since Neuroscientists don’t really understand consciousness, I’m deeply skeptical that they have a model which ties the equally nebulous “understanding” to it)
Neuroscientists have stated that the neurons in human brains can house “understanding”, but the neurons in an LLM (no matter how large) cannot. Why not?
In terms of code I’ve written…I did a fine-tuning today with only 30 examples and it outperformed my prompt already. I was very impressed.