A potential new source of training data!?

This is not directly about the API per se, but it’s germane to OpenAI anyway:

Why does Sam & Co not strike a deal with the US Government to digitize the entire Library of Congress? Presumably there would be some rights issues to be dealt with, of course. However such a large corpus of high quality tokens would indeed allow our models to derive even more and more high-level skills, then perhaps it should be considered a matter of national security, since it’s all about the “AI arms race” now.

Just my 2c…

Hi @andrewsilber,

I am not aware of the actual size of the Library of Congress; however, they already have digitized content available at Digital Collections, Available Online | Library of Congress