Your GPT knowledge base - What happens with confidential data

Hi all, feeling a little nervous, this is my first post in the community and I have not found an answer anywhere else.

I am creating a GPT to welcome new board members and want to upload an Onboarding Guide specific to the organization as knowledge base. Currently I only dare to include publicly available information, however, that makes it much less useful.

What would or could happen with any confidential data in the Knowledge base? - I know that you could obtain that through prompting, but if the link is only available to board members with an NDA (non disclosure agreement) that is ok. Also understand I can fill out a form to ask OpenAI not to train its LLM with the data. Is that reliable? Are there any other risks involved in uploading confidential data in the knowledge base?

Do you know, or you know where I could ask? THANK YOU big time in advance. Kind regards, Dagmar

I personally wouldn’t, at least with a public-facing GPT.

Why? Well, because if you ask GPT the prompt, it will do it and therefore possibly leak your sensitive information. Even if you say that it shouldn’t repeat it, sometimes people get a way around it. It’s a cat and mouse game. So the issue doesn’t lie in OpenAI, but in the users and the implementation itself.

If it’s a private GPT (don’t even know if that’s an option) then it’s another story.

If you tell OpenAI not to collect your data, and they answer the request, then they probably don’t. They’re at least legally bound not to.

My 2 cents.

3 Likes

Thank you @Fusseldieb much appreciated.
Not aware of a “private” GPT either, but assumed that distribution of GPT links can be limited (“only those with link can access”) vs. public.
I share your concerns…

1 Like

There are solutions to protect your data against all the current “data reveal” prompts, but there are likely new ones being dreamed up as we speak, so you can never be 100% sure. Best to make it a “requires link” only GPT.

1 Like

distribution of GPT links can be limited (“only those with link can access”)

That somehow limits the scope of users able to access it, but remember, Google loves to index things.
Therefore, if a user leaks the link, even by accident, on a Facebook post, Reddit comment, a forum or anywhere, Google will index it and it’ll show up on the Google page when people search for GPT’s. There’s even a search term to only look for custom GPT’s on Google.
It only requires a single person to leak it accidentally and you suddenly have random people asking it things and trying to jailbreak it to repeat the prompt.
You’ll be none the wiser until someone posts their findings. A bit far fetched, but not impossible at all.

On a similar note: That’s a similar story to people entering random video conferences, doing stupid stuff and getting kicked. It only requires a single person to post the link somewhere public and you’ll have visitors pretty soon.

1 Like

ChatGPT Enterprise does not use data for training and you can also select to not store your data from the settings options in ChatGPT Chat history & training, but that must be off for all accounts used.

1 Like

Hi Dagmar,

There are providers like Microsoft offering with Azure same techstack but with more control about sensible informations.depending on the budget it’s doesn’t have to be Microsoft.

Do you use your knowledge base for other Chatbots, too? How any documents are contained?

1 Like

Thank you @Foxalabs I am not using the Enterprise Version, but was trying to demo to other board users what you can do - so this is useful to share, Much appreciated.

Hi @MOVR thank you, currently I have 3 documents with public available reports (annual report, impact report). At the moment, I have just created a “prototype” so I can demo to fellow board members the capabilites, and you are all helping me to overcome some objections. For their implementation they would need to involve their tech experts and decide on their budget etc.

With regards to the amount of documents… I wasn’t sure it could read the second and third document very well, and realized later, that I may have needed to “train” it and step-by-step have asked questions about it.

Much fun trying it out!

1 Like

Enterprise customers can deploy internal-only GPTs

Since we launched ChatGPT Enterprise a few months ago, early customers have expressed the desire for even more customization that aligns with their business. GPTs answer this call by allowing you to create versions of ChatGPT for specific use cases, departments, or proprietary datasets. Early customers like Amgen, Bain, and Square are already leveraging internal GPTs to do things like craft marketing materials embodying their brand, aid support staff with answering customer questions, or help new software engineers with onboarding.
This info is from here Introducing GPTs. I assume you would want this.

I keep fingers cross you will convice the board members :crossed_fingers:

Haven’t create custom GPT yet, but I m curious on the above… Is it possible to have the GPT ask for the data-source, and user give the link to the .pdf that will be used as info? So it will not have data to reveal…

This type of solution where you need to train on internal documents and keep information confidential is available from Unith who use ChatGPT as the backend.

They have a number of use cases currently being trialled with Enterprise clients and shortly they will launch a self-service platform.

They are currently in a two year contract with one of the Big-5 Tech Company as a client (one of Meta, Apple, Amazon, Alphabet or Microsoft) using internal documents for training staff.

Because of NDA, details of partner and application are not publicly released, but because of this relationship they are well aware of the requirements for keeping training data sets confidential.