[Paper] Algorithmic progress in language models

anon22939549 · March 15, 2024, 12:12am

Abstract

We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months, substantially faster than hardware gains per Moore’s Law. We estimate augmented scaling laws, which enable us to quantify algorithmic progress and determine the relative contributions of scaling models versus innovations in training algorithms. Despite the rapid pace of algorithmic progress and the development of new architectures such as the transformer, our analysis reveals that the increase in compute made an even larger contribution to overall performance improvements over this time period. Though limited by noisy benchmark data, our analysis quantifies the rapid progress in language modeling, shedding light on the relative contributions from compute and algorithms.

Topic		Replies	Views
Neural Scaling Laws: The Key to AI Model Growth and Performance Optimization Community gpt	1	1752	September 14, 2024
Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance Community	3	1806	April 5, 2022
Creating a GPT-Style Language Model for a Single Question Community	1	444	November 15, 2021
[Paper] Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers Prompting research	1	1364	December 14, 2023
Two recent papers could lead to dramatic advances in model quality Community research	0	1369	September 9, 2023

[Paper] Algorithmic progress in language models

Abstract

Related topics