OpenAI - We need to talk about org verification

Today, I’m going to demonstrate that OpenAI’s org verification is ineffective and an intrusion on our privacy, despite privacy being at the “core” of their products. Hopefully, OpenAI will make the right decision and put this ID checking practice in the past.

Background

For those unaware, OpenAI is terrified of their models being distilled off-platform. Distillation is the process of harvesting output data from a LLM and then using that output data to train a different LLM. When DeepSeek came about, OpenAI was furious. Their spokesperson stated, “DeepSeek may have inappropriately distilled our models,” and, “[w]e take aggressive, proactive countermeasures to protect our technology and will continue working closely with the U.S. government to protect the most capable models being built here.”

Yes, as if they had completely forgot thought about their own pre-training process, OpenAI stated publicly that they are calling up the US government because DeepSeek trained a model off GPT-4 outputs.

OpenAI fights distillation

Frightened by the news that closed-weights models aren’t as special as they thought, OpenAI sprang into action. They partnered with a company called Persona, an ID checking firm way lesser known and definitely more shady than other industry leaders.

So why is Persona better than reputable industry leaders like Stripe? Let’s look at their privacy policy!

Uncovering the ugly truth

The images obtained from government identification document and photos of your face that you upload, and data from scans of facial geometry extracted from the government identification document and photos of your face that you upload, are collected, used and stored directly by Persona on behalf of Customer as Customer’s service provider through Customer’s website or app that you accessed.

Persona will permanently destroy data from scans of facial geometry extracted from the photos of your face that you upload upon completion of Verification or within three years of your last interaction with Persona, consistent with the Customer’s instructions unless Persona is otherwise required by law or legal process to retain the data.

When you upload your ID and scan your face, all of that is stored by Persona. Your face scans are deleted after three years at the most, but your ID is presumably saved indefinitely. Because that’s definitely necessary.

But at least all they’re doing is confirming that you aren’t lying about who you are, and they wouldn’t share this sensitive information with their customer OpenAI… right?

[Retrieve a Government Id Verification]

Welp. Turns out they can query your government ID through the API just as easily as you can query an answer from gpt-4.1 on OpenAI’s API.

So, what are the issues with Persona here?

Persona and OpenAI’s partnership with them is undoubtedly shady. You would expect that when you verify your identity, the information you provide is used only for verifying who you are and isn’t saved or shared around. But this plainly isn’t the case. When you perform verification in your OpenAI account, here’s what happens:

  1. Persona saves your documents indefinitely.
  2. OpenAI has the ability to programmatically retrieve your documents from Persona using a well-documented API.
  3. And this is all supposedly so OpenAI can be assured you aren’t storing outputs to train with them later. Yikes.

OpenAI’s org verification: malicious, or hysterics?

Obviously, OpenAI granting themselves the same investigative privileges as the government because o3 is apparently just that good (it is not) is all extremely invasive.

Now, it’s impossible for us to tell if OpenAI has another secret and more malicious motive, or if they seriously, truly think that ID checks are enough to stop an armed and dangerous distiller. However, we can brainstorm ways DeepSeek, or, the “P.R.C.” as OpenAI calls them, can easily achieve distillation despite these checks, rendering them completely useless, as an exercise to show that org verification really doesn’t do anything.

  • An adversary could simply just complete the ID checks, or pay others to do them.
  • An adversary could use stolen accounts or API keys (and give OpenAI an actual reason to call the cops this time.)
  • An adversary could type “site:chat.openai.com/share” into Google. Or, more realistically, automate something similar.

Admittedly, these aren’t the easiest things to do, so you and I with our $10 budget certainly won’t be distilling OpenAI models anytime soon, but what about a nation-state adversary that appears to have already distilled OpenAI’s reasoning models? Let me know what you think.

OpenAI hiding limitations

This is a bit of a bonus rant, so I’ll divide it into sections in case you’re interested.

Org verification requirement is now omitted from announcements and emails.

When OpenAI released org verification, they were upfront about its requirement for certain offerings in announcements and emails. But in recent announcements and emails, disappointingly, OpenAI routinely fails to mention ID check requirements despite being able to disclose pricing. I guess invasive and unnecessary ID checking is a bad look?

GPT-4 was originally falsely advertised as having vision capabilities.

This is unfortunately not a new problem. When OpenAI touted that their work-in-progress known as GPT-4 could see images, users flocked to upgrade to Plus when they added GPT-4 to the subscription. However, they failed to mention anywhere that GPT-4 in ChatGPT was not able to receive image inputs. Coupled with constant uptime issues and their literally non-existent support team (at the time), users filed chargebacks with their banks en masse. The Stripe payment gateway on OpenAI’s platform mysteriously shut down for a day or two during this. I won’t speculate… but I believe the ban hammer spoketh that day.

ChatGPT is no longer imperfect and it can't make mistakes.

There was also a time when ChatGPT had a helpful popup making you acknowledge that AI could produce harmful or inaccurate responses. Then, it turned into a little disclaimer underneath the text input. Now, it’s gone entirely. Test it yourself with incognito mode. The result? Average users who don’t know what AI is or how it works are in for a surprise when it provides seemingly malicious translations or claims to be alive. They didn’t remove this text due to lack of funding to maintain the text. They removed it because it’s bad for business.

An OpenLetter to OpenAI

I think OpenAI has some amazing technology with an amazing interface, and it makes good on its mission to provide AI that benefits humanity. Google’s interface and documentation are messy and confusing; OpenAI’s is clean and tidy. Google follows the trend in AI; OpenAI defines it.

There is no doubt that OpenAI is making incredible history here. So when I see OpenAI begin an announcement with an awesome quote from the COO, like…

Trust and privacy are at the core of our products.

… I really would like to think that OpenAI truly means it. But asking customers to give their government ID to them - to store and be able to retrieve indefinitely just to be able to access o3, reasoning summaries, and the new image generation that are all repeatedly announced like any of OpenAI’s other offerings - seems to me like a sizeable contradiction, especially since DeepSeek clearly wasn’t slowed down by it.

So, please retire org verification and pursue other ways to beat the competition. It isn’t effective and it undermines the privacy of all who submit to it.

2 Likes