Check Out This DeepMind’s New Language Model, Chinchilla (70B Parameters)

PaulBellow · April 11, 2022, 11:23pm

…[SNIP]… Following the methods outlined above, the suggested 70B Chinchilla outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG consistently and significantly (530B). The researchers also discovered that, despite employing various fitting procedures and trained models, these three approaches produce comparable predictions for optimal parameter and token scaling with FLOPs.

Overall, this research contributes to developing an effective training paradigm for large auto-regressive language models with limited compute resources. It is standard practice to increase model size without matching the number of training tokens. However, the team recommends that the number of training tokens is twice for every model size doubling. This means that using larger, higher-quality training datasets can lead to better results on downstream tasks. [PAPER] [SOURCE]

liamjgkelly · April 22, 2022, 7:59am

Woah, what a time to be alive!

Topic		Replies	Views
Language Modelling at Scale Community	11	709	January 3, 2024
Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance Community	3	1680	April 5, 2022
VentureBeat: Microsoft and Nvidia team up to train one of the world’s largest language models Community	5	697	December 19, 2023
Huawei trained the Chinese-language equivalent of GPT-3 Community	3	823	December 19, 2023
Yandex opensources 100B parameter GPT-like model Community	0	1259	June 23, 2022

Check Out This DeepMind’s New Language Model, Chinchilla (70B Parameters)

Related topics