This initiative would invite authors and rights-holders to donate their books, treatises, dissertations—particularly long-form— for non-commercial, non-reproductive use in training OpenAI’s models. All contributions would be:
- Explicitly opt-in
- Legally cleared
- Ethically aligned
It would provide:
A competitive and strategic edge
High-quality, structured, long-form training data unavailable to competitors
Diverse authorial perspectives beyond mass web data
A transparent and defensible alternative to scraped or disputed sources
A public-facing alignment with OpenAI’s mission to benefit humanity
This also presents a first-mover opportunity: no major LLM developer has yet formalized a channel for this kind of ethical corpus-building, and many authors might gladly contribute if the terms were clear and the purpose public-benefit-oriented.
Any feedback or suggestions are welcome, and if you find this idea valuable, please consider sharing it with others who might help amplify it or route it to the appropriate team.