Data Privacy and limitations

BrainlordMesomorph · May 13, 2023, 12:01pm

needless to say, I would never give GPT (or otherwise post) CC nos, medical records, Social Security numbers or anything like that.

As a privacy advocate, I even anonymize code snippets before I post them on peer support sites, removing the app name, customer names and any ‘good tricks’ I might have in the code.

I have not been so cautious with GPT, I happily give it pages of unredacted code, assuming that it won’t be posted public web and that OpenAI isn’t going to steal my VB subs. And I feel good about that.

But now I have a borderline situation regarding client data:

I have a client who has folders of PDF and DOCX invoices, and I was thinking of handing each file to GPT to extract the invoice data and give it to me is CSV format so I build DB tables out of it.

Its actually car-repair info, not medical records, but…
and it does involve 3rd party info. specifically his customers names and addresses, and their cars. But vehicle registration data, driver names and address, most of that is already on the web someplace.

So before I discuss this with the client, I was wondering what OpenAI has to say about this and what you guys are doing in similar situations

(using API access)

BrainlordMesomorph · May 13, 2023, 12:37pm

Thanks but that’s not exactly on-point.

I’m aware of the Italy situation and the data fumbles OpenAI has had so far. But all of that is about the 'Chat’GPT website and the 'Chat’GPT chat logs, those were all website publishing problems and not AI security problems.

I’m talking about using the API interface to GPT itself, (not ChatGPT) so there would be no ‘chat logs’ not even a ChatGPT user account.

Unless I record both ends of an API conversation at my end its gone. There would be core GPT logs, deep somewhere in OpenAIs servers, but even I, as an API user, can’t get to them. (the more I talk about this the better I feel about it)

EricGT · May 13, 2023, 12:55pm

(post deleted by author)

BrainlordMesomorph · May 13, 2023, 1:00pm

Sorry, given that I never used the word ‘chat’ I thought that was understood.

EricGT · May 13, 2023, 1:02pm

(post deleted by author)

BrainlordMesomorph · May 13, 2023, 1:05pm

True, but the API group is mostly guys who can’t get the API to work.

This is a more general GPT (although not ChatGPT) question.
Thanks again

curt.kennedy · May 13, 2023, 1:35pm

In general, if you are concerned about the privacy or “security” of something, your best bet is don’t send it to the API. This would include snippets of sensitive code, or events that contain sensitive PII. I haven’t seen any explicit guarantees of 100% privacy with the general API.

However, I have heard of private instances of GPT through Azure, but not sure on the privacy assurances there either. Private LLM’s (hopefully) will be rolling out, since the demand is there, but get your checkbook out since the larger models cost some bucks $$$$ to run at scale.

There are also open source models you might consider (and run on your own infrastructure), but as of right now, they aren’t typically as capable and wide-ranging as the current paid OpenAI API versions, but you would need to evaluate them yourself, depending on your requirements.

N2U · May 13, 2023, 2:40pm

I completely agree with everything Curt just said.

I would just say THE INTERNET in general, the data is still being sent even if you’re encrypting and anonymizing everything. Bad actors can target your connection without access to either endpoint. If you want to make absolutely sure that everything is private, you’ll have to keep it locally, unconnected and behind a locked door.

This may be the most relevant in your case:

I’ll suggest you play around with the en_web_core_sm from the spaCy library, I’ve been using that for dealing with large amounts of pdf files, it’s very simple but it should be able to most of if not all your exaction task’s locally

Topic		Replies	Views
RAG on private dataset via LangChain, does OpenAI / ChatGPT get access to the documents? API	15	19492	February 6, 2024
API - Code Interpreter - Doubts about data security and Privacy Community chatgpt	6	461	June 3, 2024
Data Privacy Using Custom GPT's Community chatgpt	1	6351	November 21, 2023
ChatGPT API - Data Privacy API	2	1887	May 29, 2024
Research assistant Community	23	3194	July 9, 2024

Data Privacy and limitations

Related topics