Two questions, if anyone knows -
First Q:
Under “Models referred to as GPT 3.5”, the page lists a series of models trained to a particular date epoch, and presumably sharing some architecture. Should text-embedding-ada-002
be in this list? It uses the same tokenizer and came out during a similar time period and shares the training date cutoff. Or, is it not part of the GPT 3.5 series because it is not a completion model?
Second Q:
Under “Models featured in OpenAI Research”, the page lists from "Language Models are Few-Shot Learners:
GPT-3 175B | davinci | 175B |
---|---|---|
GPT-3 6.7B | curie | 6.7B |
GPT-3 1B | babbage | 1B |
and ada isn’t here at all. My observation is that this doesn’t line up in the obvious way with the named models’ parameter counts:
davinci - 175B parameters
curie - 13B parameters
babbage - 6.7B parameters
ada - 2.7B parameters
(see https://arxiv.org/pdf/2106.07131.pdf, citing to OpenAI’s “Few Shot Learners” paper).
OpenAI describes these as “most proximate models” that the research models correspond to. So, are they saying that the research models listed are most closely related to the named models, despite having parameter sizes that are very different from them?
Thanks!