The GPT-3 paper says the models were trained on filtered Common Crawl, WebText2, Books1, Books2, and Wikipedia.
Is one of these books datasets Project Gutenberg? If not, is there any public information about these datasets?
Thanks.
The GPT-3 paper says the models were trained on filtered Common Crawl, WebText2, Books1, Books2, and Wikipedia.
Is one of these books datasets Project Gutenberg? If not, is there any public information about these datasets?
Thanks.