Model index for Researchers, are these descriptions correct?

Two questions, if anyone knows -

First Q:

Under “Models referred to as GPT 3.5”, the page lists a series of models trained to a particular date epoch, and presumably sharing some architecture. Should text-embedding-ada-002 be in this list? It uses the same tokenizer and came out during a similar time period and shares the training date cutoff. Or, is it not part of the GPT 3.5 series because it is not a completion model?

Second Q:

Under “Models featured in OpenAI Research”, the page lists from "Language Models are Few-Shot Learners:

GPT-3 175B davinci 175B
GPT-3 6.7B curie 6.7B
GPT-3 1B babbage 1B

and ada isn’t here at all. My observation is that this doesn’t line up in the obvious way with the named models’ parameter counts:

davinci - 175B parameters
curie - 13B parameters
babbage - 6.7B parameters
ada - 2.7B parameters
(see https://arxiv.org/pdf/2106.07131.pdf, citing to OpenAI’s “Few Shot Learners” paper).

OpenAI describes these as “most proximate models” that the research models correspond to. So, are they saying that the research models listed are most closely related to the named models, despite having parameter sizes that are very different from them?

Thanks!

1 Like

Your stats seem wrong. The perplexity scores of the multiple GPT-3 models were independently compared to performance metrics of the named base models, and this seems accurate:

Ada, Babbage, Curie and Davinci line up closely with 350M, 1.3B, 6.7B, and 175B respectively

The pretraining data size of each is not published, but 175b should be the one with a 45TB corpus.

The new ada embeddings model uses a different number of dimensions than prior GPT models, likely a whole new ground-up training, perhaps using greater or more specialized knowledge for the task than prior ada tier, along with some tunings one can’t evaluate except by performance of the whole model by endpoint.

1 Like

Ah, thanks for the clarification. My parameter numbers are from the researchers citing the openai paper, and sounds like they were just off.