Hi all, in-depth question on OpenAI’s current terms of use and policy documents on confidentiality. There is a contradiction between the “we care about data security/privacy” picture painted by the policy documents and what is actually in the terms of use. It would be useful to get reasoned input and discussion going on this contradiction which is more than the usual “here is a link to the terms of use, privacy and data use policy documents”
The terms of use in clause 5(a) state that:
5(a) Confidentiality. You may be given access to Confidential Information of OpenAI, its affiliates and other third parties. You may use Confidential Information only as needed to use the Services as permitted under these Terms. You may not disclose Confidential Information to any third party, and you will protect Confidential Information in the same manner that you protect your own confidential information of a similar nature, using at least reasonable care.
The terms of use confidentiality clause does not obligate OpenAI to keep input prompt data confidential or treat it confidentially in any way.
In contrast, the “How your data is used” policy document paints a picture that OpenAI is under a clear obligation to keep (non-personal) input prompt data data confidential where it states that:
We know that data privacy and security are critical for our customers. We take great care to use appropriate technical and process controls to secure your data. We remove any personally identifiable information from data we intend to use to improve model performance. We also only use a small sampling of data per customer for our efforts to improve model performance. For example, for one task, the maximum number of API requests that we sample per customer is capped at 200 every 6 months.
We understand that in some cases you may not want your data used to improve model performance. You can opt out of having your data used to improve our models by emailing your request, along with your [organization ID], to support@openai.com. Please note that in some cases this will limit the ability of our models to better address your specific use case.
Opting out appears only to prevent your input data being sampled but does not stop OpenAI from storing your input prompts and completions indefinitely.
Examples:
A dev team sending input prompts to the GPT3 API that include unstructured confidential (non-personal) data in order to parse/bring structure to it. OpenAI stores this data on its servers indefinitely.
A dev shop sending input prompts to the GPT3 API for generating tedious bit of a code to speed up the their project delivery time. OpenAI stores this code on its servers indefinitely.
This contradiction between the picture the policy documents paint and the one-sided confidentiality clause makes it tricky for internal dev teams to sell the idea of using GPT3 API services internally to management.
Another aspect that further reduces legal certainty around what OpenAI’s confidentiality obligations are is that clause 3(a) of the terms of use state that:
(a) Your Content. You may provide input to the Services (“Input”), and receive output generated and returned by the Services based on the Input (“Output”). Input and Output are collectively “Content.” As between the parties and to the extent permitted by applicable law, you own all Input, and subject to your compliance with these Terms, OpenAI hereby assigns to you all its right, title and interest in and to Output. OpenAI may use Content as necessary to provide and maintain the Services, comply with applicable law, and enforce our policies. You are responsible for Content, including for ensuring that it does not violate any applicable law or these Terms.
Effectively, if all rights (which would include any copyright in the input and output confidential data) are owned and/or assigned to the user and the user has opted out of their data being used for training, is there not a risk that a user can commence a copyright infringement suit to get a court order requiring OpenAI to delete user owned, opted out confidential input/output data that accordingly does not fall under the consented umbrella of “using the Content as necessary to provide and maintain the Services”.
Microsoft’s Azure OpenAI terms of use are different
It is perhaps very telling that the terms of use (Data, privacy, and security for Azure OpenAI Service - Azure AI services | Microsoft Learn) under which Microsoft offers Azure OpenAI services do make it VERY clear that data is (i) not stored for more than 30 days and (ii) not used for training Microsoft models. Relevant passages copied below:
Training data provided by the customer is only used to fine-tune the customer’s model and is not used by Microsoft to train or improve any Microsoft models.
The requests & response data may be temporarily stored by the Azure OpenAI Service for up to 30 days. This data is encrypted and is only accessible to authorized engineers for (1) debugging purposes in the event of a failure, (2) investigating patterns of abuse and misuse or (3) improving the content filtering system through using the prompts and completions flagged for abuse or misuse.
Sorry for the long post but I think the data confidentiality point needs discussion that can’t be resolved with a simple “here’s a link to the terms of use and policy documents” reply given that we are starting to see a greater influx of GPT3 API powered apps being made available and I very much doubt any of them are on the Azure OpenAI infrastructure which has a much clearer and useful data confidentiality approach.