API - Is our data really "ours"? Major Concern in Data Processing Addendum

naferreira_7 · May 22, 2024, 5:57pm

OpenAI clearly says that “We do not train on your business data (data from ChatGPT Team, ChatGPT Enterprise, or our API Platform)”, and that “you own your inputs and outputs”.

However, in the data processing addendum at https://openai.com/policies/data-processing-addendum/ , which regulates data processing for the API and ChatGPT Enterprise, there is a clause that says:

“For clarity, OpenAI may continue to process information derived from Customer Data that has been deidentified, anonymized, and/or aggregated such that the data is no longer considered Personal Data under applicable Data Protection Laws and in a manner that does not identify individuals or Customer to improve OpenAI’s systems and services.”

So, OpenAI apparently DOES use our data sent to the API after deidentifying it, and this data can be used to improve their systems and that can cause a lot of problems from a Legal perspective. Did anyone have to go through this issue?

Diet · May 22, 2024, 6:39pm

IANAL obviously, but I think that pertains mostly to things like geographical usage data and such.

However it’s understandable that there’s a concern that this vague language can open up a lot of unwanted doors - if customer data privacy is a high concern (which it rightly should be) then I’d recommend you take a look at the Azure offerings for your API needs

scharleswatson · May 22, 2024, 7:16pm

I’d guess there are lots of people that will soon be dealing with this. I agree with looking for an alternate option, especially if legality and private proprietary information is involved. Right now there are so many people doing development and pumping their employer’s data and user’s personal chats into the openAI system, without understanding the reality of what they are doing, there is for sure going to be a fallout at some point.

wclayf · May 22, 2024, 8:59pm

Why would anyone actually believe any Tech Company who says they’re not going to milk as much value out of your data as possible? Of course they will. You can ignore whatever any of their “agreements” say, and if you plan to keep any of your data genuinely private don’t send it out over the web to one of these cloud companies. The only truly “private” LLMs are ones you run locally.

naferreira_7 · May 22, 2024, 9:18pm

What makes it so challenging to run it locally without any cloud infrastructure is the size of the models. I mean, a LLama 3 70B which is good but can’t be compared to GPT-4o would require about 280GB for inference. Not using cloud is a death sentence for many applications.

wclayf · May 22, 2024, 9:48pm

I agree there’s definitely no economical way to run local LLMs. However I wonder if some form of obfuscation/anonymization can be used to send Cloud LLMs data that they can still use but it’s totally anonymized before going over the network. For example, have a way to make “John Doe” be sent to the LLM as “Joe Fox”. You could then protect your private information by feeding the LLM garbage data (rather than trusting them to anonymize it) but still getting results you can transform back into useful results.

Of course for words whose location in semantic vector space is significant, you can’t anonymize like that, but you can at least anonymize stuff like phone numbers, emails, company names, and people’s names, in a way that you can “unscramble” and rematch it to the correct info when you get the resuts.

thinktank · May 23, 2024, 2:22am

“Privacy is dead! Long-Live Privacy!”

Topic		Replies	Views
Contradiction between terms of use and policy docs API	0	2792	January 16, 2023
API Compliance with Fintech companies API	2	215	May 9, 2024
Will documents will be used to train the bot? Anything will be shared from out documents? API	3	1235	December 15, 2023
Will my fine-tuning data remain private? API	3	3472	December 23, 2023
New OpenAI Business Terms and what it means for GDPR Community api	5	2678	January 18, 2024

API - Is our data really "ours"? Major Concern in Data Processing Addendum

Related Topics