What is the size of the training set for GPT-3

_j · September 8, 2023, 10:38pm

I can show that the smaller GPT-3 models were trained on less data, by omissions so as to not overfit. We test what they know.

ada-001 (instruction-following tune):
We’ll try to make it complete the most obvious line of a song:

The output skips to the next line, with the correct word in only at 2.32%.

oops, meant to use the base model ada for completions, here’s that:

Results without poem tuning is just uninformed repetition, in at 10%.

So iterate until we find knowledge. babbage gives in as likely as a comma, but a new line wins by far (then repeating the input):

I tried to make it even more obvious for curie, “just give me the last two words of a line”, but still wrong:

Finally davinci can do it, with its 25x jump in model parameters.
We can see having the knowledge gives AI a massive jump to 97.7%:

Interestingly, the new babbage-002 gives us hints it has been trained on large data, but instead we get the model’s extreme perplexity and quick degradation, making it often worse than ada for general tasks:

(Did they just take their davinci-002 and give it 1.5 bit resolution?)

Topic		Replies	Views
Discussion thread for "Foundational must read GPT/LLM papers" Community gpt-4 , gpt-35-turbo , chatgpt , research	75	10523	September 3, 2024
What version of GPT is `text-embedding-ada-002` based on? API embeddings , api	7	8683	September 30, 2023
Fine-Tuning Setup for gpt-3.5-turbo-16k API fine-tuning , api	9	3638	October 31, 2023
Do 'MAX tokens' include the follow up prompts and completion in a single chat session API token	22	5266	August 25, 2023
How does the knowledge of custom GPT actually work Documentation chatgpt	7	16104	December 1, 2023

What is the size of the training set for GPT-3

Related topics