Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance

Posted by Sharan Narang and Aakanksha Chowdhery, Software Engineers, Google Research

In recent years, large neural networks trained for language understanding and generation have achieved impressive results across a wide range of tasks. GPT-3 first showed that large language models (LLMs) can be used for few-shot learning and can achieve impressive results without large-scale task-specific data collection or model parameter updating. More recent LLMs, such as GLaM, LaMDA, Gopher, and Megatron-Turing NLG, achieved state-of-the-art few-shot results on many tasks by scaling model size, using sparsely activated modules, and training on larger datasets from more diverse sources. Yet much work remains in understanding the capabilities that emerge with few-shot learning as we push the limits of model scale. Source: Google Blog

3 Likes

I’m curious how big the window is and how expensive it is to run. In my research, one of the biggest limitations is window size. This effectively limits the “working memory” of any AGI to the present maximum of 4000 tokens. While the size of the human working memory is debatable, one has only to see just how much contextual information is needed to get a LLM to make correct decisions to see that our brains automatically recruit hundreds (if not thousands) of unconscious memories when helping us make decisions.

Still, I’m glad to see some scientific confirmation of what I’ve been saying for a while: LLM can outperform humans. We’re seeing the groundwork being laid for superintelligence.

One thing I’ve noticed about LLM is what I’m calling implicit agency. The prompts used in palm are written in the first person: “I will explain this joke”

This has some interesting implications for consciousness, ego, and agency in humans. In my experiments with cognitive architecture, it’s pretty difficult to maintain a first POV because the model ultimately doesn’t know what it is. It’s just guessing what “I” means because of its training data. So instead, I’ve found that it’s much easier to define what “I” am in the third person or at least explicitly at the beginning of my prompts. Thus, my current prompts all start with “I am an artificial intelligence”

1 Like

A single NLP system can generalize across millions of tasks, understand different data types, and have incredible efficiency.

1 Like