Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance

PaulBellow · April 4, 2022, 9:51pm

Posted by Sharan Narang and Aakanksha Chowdhery, Software Engineers, Google Research

In recent years, large neural networks trained for language understanding and generation have achieved impressive results across a wide range of tasks. GPT-3 first showed that large language models (LLMs) can be used for few-shot learning and can achieve impressive results without large-scale task-specific data collection or model parameter updating. More recent LLMs, such as GLaM, LaMDA, Gopher, and Megatron-Turing NLG, achieved state-of-the-art few-shot results on many tasks by scaling model size, using sparsely activated modules, and training on larger datasets from more diverse sources. Yet much work remains in understanding the capabilities that emerge with few-shot learning as we push the limits of model scale. Source: Google Blog

daveshapautomator · April 5, 2022, 12:08pm

I’m curious how big the window is and how expensive it is to run. In my research, one of the biggest limitations is window size. This effectively limits the “working memory” of any AGI to the present maximum of 4000 tokens. While the size of the human working memory is debatable, one has only to see just how much contextual information is needed to get a LLM to make correct decisions to see that our brains automatically recruit hundreds (if not thousands) of unconscious memories when helping us make decisions.

Still, I’m glad to see some scientific confirmation of what I’ve been saying for a while: LLM can outperform humans. We’re seeing the groundwork being laid for superintelligence.

daveshapautomator · April 5, 2022, 12:17pm

One thing I’ve noticed about LLM is what I’m calling implicit agency. The prompts used in palm are written in the first person: “I will explain this joke”

This has some interesting implications for consciousness, ego, and agency in humans. In my experiments with cognitive architecture, it’s pretty difficult to maintain a first POV because the model ultimately doesn’t know what it is. It’s just guessing what “I” means because of its training data. So instead, I’ve found that it’s much easier to define what “I” am in the third person or at least explicitly at the beginning of my prompts. Thus, my current prompts all start with “I am an artificial intelligence”

Datasculptor · April 5, 2022, 1:29pm

A single NLP system can generalize across millions of tasks, understand different data types, and have incredible efficiency.

Topic		Replies	Views
Google Releases Paper on PaLM-E (562b parameters) Community	0	952	March 8, 2023
Neural Scaling Laws: The Key to AI Model Growth and Performance Optimization Community gpt	1	1652	September 14, 2024
Language Modelling at Scale Community	11	822	January 3, 2024
VentureBeat: Microsoft and Nvidia team up to train one of the world’s largest language models Community	5	744	December 19, 2023
Check Out This DeepMind’s New Language Model, Chinchilla (70B Parameters) Community	1	2065	April 22, 2022

Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance

Related topics