Data Privacy and limitations

needless to say, I would never give GPT (or otherwise post) CC nos, medical records, Social Security numbers or anything like that.

As a privacy advocate, I even anonymize code snippets before I post them on peer support sites, removing the app name, customer names and any ‘good tricks’ I might have in the code.

I have not been so cautious with GPT, I happily give it pages of unredacted code, assuming that it won’t be posted public web and that OpenAI isn’t going to steal my VB subs. And I feel good about that.

But now I have a borderline situation regarding client data:

I have a client who has folders of PDF and DOCX invoices, and I was thinking of handing each file to GPT to extract the invoice data and give it to me is CSV format so I build DB tables out of it.

Its actually car-repair info, not medical records, but…
and it does involve 3rd party info. specifically his customers names and addresses, and their cars. But vehicle registration data, driver names and address, most of that is already on the web someplace.

So before I discuss this with the client, I was wondering what OpenAI has to say about this and what you guys are doing in similar situations

(using API access)

1 Like

Thanks but that’s not exactly on-point.

I’m aware of the Italy situation and the data fumbles OpenAI has had so far. But all of that is about the 'Chat’GPT website and the 'Chat’GPT chat logs, those were all website publishing problems and not AI security problems.

I’m talking about using the API interface to GPT itself, (not ChatGPT) so there would be no ‘chat logs’ not even a ChatGPT user account.

Unless I record both ends of an API conversation at my end its gone. There would be core GPT logs, deep somewhere in OpenAIs servers, but even I, as an API user, can’t get to them. (the more I talk about this the better I feel about it)

(post deleted by author)

Sorry, given that I never used the word ‘chat’ I thought that was understood.

(post deleted by author)

True, but the API group is mostly guys who can’t get the API to work.

This is a more general GPT (although not ChatGPT) question.
Thanks again

In general, if you are concerned about the privacy or “security” of something, your best bet is don’t send it to the API. This would include snippets of sensitive code, or events that contain sensitive PII. I haven’t seen any explicit guarantees of 100% privacy with the general API.

However, I have heard of private instances of GPT through Azure, but not sure on the privacy assurances there either. Private LLM’s (hopefully) will be rolling out, since the demand is there, but get your checkbook out since the larger models cost some bucks $$$$ to run at scale.

There are also open source models you might consider (and run on your own infrastructure), but as of right now, they aren’t typically as capable and wide-ranging as the current paid OpenAI API versions, but you would need to evaluate them yourself, depending on your requirements.

2 Likes

I completely agree with everything Curt just said.

I would just say THE INTERNET in general, the data is still being sent even if you’re encrypting and anonymizing everything. Bad actors can target your connection without access to either endpoint. If you want to make absolutely sure that everything is private, you’ll have to keep it locally, unconnected and behind a locked door.

This may be the most relevant in your case:

I’ll suggest you play around with the en_web_core_sm from the spaCy library, I’ve been using that for dealing with large amounts of pdf files, it’s very simple but it should be able to most of if not all your exaction task’s locally :laughing:

2 Likes