Security around client data

carla · July 29, 2021, 5:33pm

Hi guys,

I’m concerned about using GPT-3 and exposing client data. In other words, I do not want to upload training data or post prompts to OpenAI’s API that contain any client information. There’s many ways for identifying numbers, codes, email addresses, names and locations using regex and small NER models, but I’ll bet many others have already faced this issue and come up with good solutions, so I thought I’ll quickly ask this question on the forum.

For example, I want to use GPT-3 to analyze text such as “Mr. John Brown alleges homeless people started a fire in his vacant home somewhere between 9 and 12 December 2021.” I don’t want to send the name “John Brown” to the OpenAI completion endpoint for analysis, I’d rather replace it with a pseudonym or blank it out completely.

Is there perhaps an existing tool that removes or blanks out sensitive data from text that anybody knows of?

NSY · July 30, 2021, 9:13am

It sounds like a challenge since you need to filter it before sending it to the Engine. That will require using another engine, even offline, to analyze and change the text or to build filters by hand and implement them in your code.
However, to send it within the prompt but remove it from the completion, you can train the model by providing examples and/or using the instruct Engines and adding it as instructions.
In any case, if you find an effective way to filter the data before sending it, please share.

carla · July 30, 2021, 9:24am

Thanks @NSY, I’ll definitely share if I find something that’s opensource, but I’ve got a feeling it’s something we simply have to dev ourselves for our specific use-case.

NSY · July 30, 2021, 9:34am

Thanks. Privacy is a concern in other use cases as well, it’s worthwhile to make a public discussion about different aspects of it and what happens after the data is being sent to the Engine.

Topic		Replies	Views
RAG on private dataset via LangChain, does OpenAI / ChatGPT get access to the documents? API	15	18699	February 6, 2024
Data Privacy and limitations Community privacy	8	2511	December 16, 2023
Data Privacy Using Custom GPT's Community chatgpt	1	5040	November 21, 2023
Using ChatGPT to detect harmful behavior without losing access due to violating OpenAI's content policies API	1	636	May 24, 2023
GPTs accepting images (or ChatGPT or API too) that have personal identifying information (PII) Community api , image-reading , gpts	5	1899	May 17, 2024

Security around client data

Related topics