Rumor: GPT-4 to be 500x larger, available in a few years

I’ve had my eye on Cerebras for a while. I hope all this is true. If so, we can expect to have AGI/superhuman intelligence within a few years. The only barrier then will be cost (energy and chips), which should still come down over time.

Playing with various numbers and trends, such as energy per FLOPS, number of transistors, and so on - I keep arriving at the same year for when computers each parity with human brains: 2040.

While GPT-4 may have superior intellectual abilities, it will be many orders of magnitude more power hungry than a human brain.

Anyways, on short tasks, I would argue that GPT-3 is already intellectually superior to most humans. For instance, during one fine-tuning experiment I asked DAVINCI to make inferences about various topics by asking pointed questions. It performed very well on technical and medical situations. This means that a plain vanilla GPT-3 bot can probably help out on any QA forum. Obviously there are regulations and safety to think about, but the technical ability is there.

I still believe that my work, a Cognitive Architecture, will be necessary to give the transformer structure - mostly a way to access and organize long term memories. But data retrieval is a well researched problem. Otherwise, there is merely some structure required to plumb all the cognitive functions together.

Anyways, I’m drifting from the main point. This is exciting news if true


Apparently, Sam Altman was saying the next version will have better training, but the same numbers of parameters. I can’t find the link for some reason.


Hey @daveshapautomator thanks for sharing this! Just curious, what are you using as your estimate for the processing capacity of the human brain (in FLOPS)? I’ve seen a number of different estimates over the years and this is a fascinating topic to me. Also, assuming compute capacity reaches parity (or beyond) with the human brain, do you think that that is all it will take for creating truly sentient machines? Or by AGI/superhuman, are you just referring to outperforming humans at most computing / data processing tasks? Thanks again!


This is one of the best academic posts on the topic. There are quite a few more:

But then you can also look at another metric beyond FLOPS - watts-per-FLOPS

Every time I slice and dice these numbers, I generally arrive at the same time window that Ray Kurzweil did: 2040 to 2045.

Now, to your question about sentience and superintelligence:

  1. I believe that we will have functionally sentient machines very soon. For instance, my work on Cognitive Architectures shows that you can integrate just a bit of data about “self” (computational parameters, status of services, etc) and the machine can understand its own existence with GPT-3, at least from a functional standpoint.
  2. I separate functional sentience from philosophical sentience - does something have a “true” subjective experience. This question is only important for philosophical considerations such as suffering and rights. However, I think this conversation is an exercise in fantasy for a few reasons:
  3. Living organisms evolved with pain and fear as central to our survival strategy. Hence why suffering is central to most of our philosophy and religion. In Buddhism, they say that life is suffering and the path to relief is through radical acceptance. In Christianity, Hinduism, Judaism, and Islam, they all say that this world is a degenerate version of reality, slowly decaying due to distance from deity, and that only through death (and some kind of rebirth) can we attain relief from suffering. Absolutely none of this has any bearing on the operation or design of AGI systems.
  4. Therefore, the fixation on sentience and personhood is an exercise in fantasy, a red herring that everyone should stop discussing forever :stuck_out_tongue: Even if machines can approximate life, think the way we think, and pretend to understand pain, love, and fear, it will always be a facsimile. This is not a weakness, but a strength. It means that we can create machines that will never act out of a sense of vengeance or self preservation. Sure, we could encode primal reflexes into machines, but why would we do such an idiotic thing when humans are vindictive enough already?

I suspect that, with just a bit more work on my cognitive architecture and finetuning, GPT-3 could already prove to be superhuman at most intellectual tasks. However, GPT-3 is prohibitively expensive to use, not to mention bad for the environment due to the astronomical amounts of power. There are also numerous technical limitations to GPT-3 which prevent it from being widely useful without a lot of fine crafting, testing, and tweaking for true AGI tasks. However, this are all temporary problems, hence why I shared the link about GPT-4.

Will we ever achieve “true AGI”? Probably not, according to some people. As I remarked to a fellow researcher: humans are trolls and have an endless capacity to move the goalposts. GPT-5 might be powerful enough to anticipate every single person’s thoughts for 8 days while managing the entire global economy, but some twerp is going to say “Yeah well but it’s not actually sentient and can’t feel love so it doesn’t count”. I predict that we will arrive at a time when most humans decide to delegate their votes and political influence to superintelligent machines expressly because of their lack of human bias. I also think that, at that point, we will laugh at the idea that we ever wanted to recreate human intelligence, given our numerous cognitive and emotional shortcomings. When we built telescopes, we were not simply trying to copy human eyes, we built machines far superior to our own eyes. So to will we do with intelligent machines.

I write about much of this in my book, if you’re curious.


For starters, thanks for the links and your very well articulated response.

I espiclially like the following:

I think this is one of the most coherent and rational things I’ve heard someone say about AGI - brilliantly put!

I agree with every word you wrote here and will be reading your book for sure.

Thanks again!


As @bakztfuture said, these rumors seem to be untrue based on a Q&A from Sam Altman.

was the link I read that from as well. Currently, the site seems to be having issues.

1 Like

It’s always possible. It’s important to note that the article I linked was quoting someone from Cerebras, so perhaps it was a misunderstanding or mere speculation. Still, interesting.

1 Like

I calculated that overcoming the quadratic attention window problem to reach book-length generations would require an increase of ~3000x in performance – A prospectus for long-form completions. I can’t find good figures, but I think we get there sooner than 2040.

1 Like

Oh, I definitely believe we’ll get to that point before 2040. Considering the exponential growth in performance between GPT-2 and GPT-3, I anticipate that whatever the next iteration is will knock the socks off all of us. Certainly, there will still be issues with long-term recall, as neural networks have no way to retrieve episodic memories. But still, the larger the context you can give it, the more information it can integrate and work with in one shot.

1 Like

More grist for the rumor mill

Altman said in the interview that contrary to popular belief, GPT-4 will not be any bigger than GPT-3 but will use more compute resources. This is an interesting announcement considering the vocal voices against the perils of having large language models and how they disproportionately affect both the environment and the underrepresented communities. Altman said this would be achieved by working with all the different aspects of GPT, including data algorithms and fine-tuning.

He also said that the focus would be to get the most out of smaller models. Conventional wisdom says that the more parameters a model has, the more complex tasks it can achieve. Researchers have been increasingly speaking up about how the effectiveness of a model may not necessarily be as correlated with its size as believed. For instance, recently, a group of researchers from Google published a study showing that a model much smaller than GPT-3 — fine-tuned language net (FLAN) — delivered better results than the former by a large margin on a number of challenging benchmarks.

So it seems like there’s some conflicting information. Maybe the Cerebras dude was spilling some secret beans or something. Anyways, it looks like some neat stuff is coming down the pipeline either way.

Thinking about this - I am wondering if the Cerebras WSE-2 allows for multiple instances of GPT-3 to run on it in parallels. Certainly, a large CPU could run containers. It’s also much more efficient at a lot of tasks than other systems. So now I’m wondering if this partnership will actually mostly be used to drive down the operating costs of running GPT-3? Given how big it is, it could also be used to accelerate training and experimentation. For my use case, it would be far better if it made GPT-3 cheaper and faster. Right now, DAVINCI is prohibitively slow and expensive.


This is the best answer about topic I’ve seen in a long time.


That might also allow the model to be updated on a more regular basis, which would be nice.


I’m really excited to read your book @daveshapautomator thanks so much for sharing your insights on this topic

1 Like

I also strongly agree with your points. Since we only understand cognition in terms of our own intelligence, emotions and experiences; even our most ambitious attempts to achieve AGI will only emulate what we already know. And as you said, there’s nothing wrong with that.

Now … I do fear for the day we somehow manage to create something that redefines what it means to “think” :fearful:.

I added your book to my reading list – it looks very interesting.

1 Like

Thanks both of you. Yeah I’ve thought about what it might be like to create a machine that “thinks” vastly differently from us but there are still poorly understood mechanisms within our own cognition. If you study things like the unconscious, archetypes, inner critic, and theory of mind, it becomes abundantly clear that we can have multiple entities sharing our head. I wouldn’t be surprised if it comes out eventually that a significant proportion of people actually have multiple “personalities” that they are completely unaware of. Our minds are already quite exotic


Great point! The randomness of human personalities adds a level of complexity that I can’t imagine even a “successful” AGI emulating – beyond simply mimicking a set of clearly defined personality traits. Individual personalities vary so much based on situation, environment, and genetics. No two personalities are exactly the same.

In a fantasy world where you have near unlimited computational power, if you simulate the conditions that lead to the evolution of philosophical sentience with evolutionary models capable of mutating any part of their structure, would that overcome the hurdle of true sentience having evolved through suffering?

You’ll run into the same problems we have today: how can you prove philosophical sentience? How can you prove that anything has qualia? Anyways, I think it would be a huge ethical violation to deliberately create something capable of suffering. Not to mention it would be a horrible idea.

Is creating something capable of suffering in this way ethically different from having a baby, or otherwise causing the creation of a biological organism?