Model index for Researchers, are these descriptions correct?

bobartig · August 1, 2023, 3:15pm

Two questions, if anyone knows -

First Q:

Under “Models referred to as GPT 3.5”, the page lists a series of models trained to a particular date epoch, and presumably sharing some architecture. Should text-embedding-ada-002 be in this list? It uses the same tokenizer and came out during a similar time period and shares the training date cutoff. Or, is it not part of the GPT 3.5 series because it is not a completion model?

Second Q:

Under “Models featured in OpenAI Research”, the page lists from "Language Models are Few-Shot Learners:

GPT-3 175B	davinci	175B
GPT-3 6.7B	curie	6.7B
GPT-3 1B	babbage	1B

and ada isn’t here at all. My observation is that this doesn’t line up in the obvious way with the named models’ parameter counts:

davinci - 175B parameters
curie - 13B parameters
babbage - 6.7B parameters
ada - 2.7B parameters
(see https://arxiv.org/pdf/2106.07131.pdf, citing to OpenAI’s “Few Shot Learners” paper).

OpenAI describes these as “most proximate models” that the research models correspond to. So, are they saying that the research models listed are most closely related to the named models, despite having parameter sizes that are very different from them?

Thanks!

_j · August 2, 2023, 8:52am

Your stats seem wrong. The perplexity scores of the multiple GPT-3 models were independently compared to performance metrics of the named base models, and this seems accurate:

Ada, Babbage, Curie and Davinci line up closely with 350M, 1.3B, 6.7B, and 175B respectively

The pretraining data size of each is not published, but 175b should be the one with a 45TB corpus.

The new ada embeddings model uses a different number of dimensions than prior GPT models, likely a whole new ground-up training, perhaps using greater or more specialized knowledge for the task than prior ada tier, along with some tunings one can’t evaluate except by performance of the whole model by endpoint.

bobartig · August 2, 2023, 2:44pm

Ah, thanks for the clarification. My parameter numbers are from the researchers citing the openai paper, and sounds like they were just off.

Topic		Replies	Views
What do all these models do? API	7	10152	July 7, 2023
What version of GPT is `text-embedding-ada-002` based on? API embeddings , api	7	8659	September 30, 2023
Model documentation or request for a "model" category? API gpt-4	3	273	May 10, 2024
Are OpenAI text-embedding-ada-002 embedding model greater than text-embedding-3-large? Community embeddings , chatgpt , api	1	1618	February 21, 2024
API Pricing for Specific Models API	7	4022	December 24, 2023

Model index for Researchers, are these descriptions correct?

Related topics