If I use corpus which is including my company confidential information to create Embeddings, are there any risks to leakage such confidential information ?
Can I know what will happen to original corpus after processing ? Also, I would like to know fine tuning and other capability.

For details on our data policy, please see our and Privacy Policy and Terms of Use documents. Please also see how your data is used and more information here.


(Pardon the resurrection, but this seems like an important topic).

As of May 7, 2023, it reads at How your data is used to improve model performance | OpenAI Help Center

" OpenAI does not use data submitted by customers via our API to train OpenAI models or improve OpenAI’s service offering. In order to support the continuous improvement of our models, you can fill out this form to opt-in to share your data with us."

So it sounds like they’re saying usage through the API doesn’t get fed back into their models, but for non-API usage (like ChatGPT), they do.

So I guess the API-based usage applies to the Embeddings API? And that would imply that any data we send via the Embeddings endpoint is processed, yet not retained by OpenAI? Anyone know if this holds up to HIPAA requirements? Is there any way this can be confirmed beyond just taking their word for it? (If building a startup that is considering passing proprietary data to the embeddings endpoint, it’ll be handy to have something to tell investors to give them confidence we aren’t accidentally and irreversibly giving away secret sauce).

Anyone have any more insight?