I’m not OpenAI but they seem to be very very scared of others distilling their models off-platform. They’re worried about someone recording reasoning summaries and then training other models on them in place of reasoning tokens.
So many of the “open weights” models are just distillations of OpenAI models, so it isn’t unfounded. They don’t want reasonings, o3, and image generation being copied until they already have competition.
OpenAI’s business model isn’t just selling the platform and the compute. They’re selling the ability to use LLMs that only they have access to. So naturally, the realization that you can easily just train off their outputs, and walk away with basically the same thing, terrifies them. Hence, dystopian ID checks.
It isn’t very “open” but whatevs… we have affordable API for most other stuff so it could always be worse.
TLDR: They don’t want Deepseek taking notes too detailed.