What are the latest strategies for prevening prompt leaks?

Hi all,
I am using the OpenAI API to create a health-based chatbot. However, I have some proprietory information in the chatbot’s system message that I want to prevent from getting leaked. Anyone have any sources of information on various tactics or phrases to use to at least block 90% of the attempts by users to steal the system prompts? I don’t expect to prevent 100% of the leaks, but id like to at least prevent more than half. Thank you.

3 Likes

Didn’t put sensitive information into user space.

Anything you give the model can ultimately be retrieved.

3 Likes

I am not concerned about that. I have nothing sensitive. It’s just system prompts, however, I’d like to prevent people from stealing and making their own GPTs.

2 Likes

If there is no proprietary info in the prompts than how are they sensitive? Anyone experienced in making chat bots could whip up a health-centered chat bot, unfortunately unless the prompt is related to some sort of specific measurement than theres no way for it to be truly valuable IP

I think we should be looking at the concept of “prepared statements” in database management for this. You should filter and check the results before outputting to the user. I also don’t know if there is a generalized technique, because it is use case specific. The idea is that if you have some kind of complex prompt, the user might be able to ask the chatbot to spit out the prompt, when that is not desired. The solution, if we try to keep “prepared statements” in mind, would be to analyze the content of the chat and pre-process it or post-process it, perhaps with another a.i. based prompt. For instance: if you had a super secret prompt that lists the “bestness” of 100 flavors of ice cream in order, and your prompt is proprietary and includes this special list you don’t want revealed to the user… then you might have a function to pre-process the chat result and looks for the ice cream flavors in order and then throws an exception if found. This is a convoluted way of saying you need to check the chat results manually [algorithmicly], if you want to prevent prompt leaks. You can’t prompt the LLM to not leak the prompt.

1 Like

Hi all,

I have the same concern. I am working on a GPT that helps people choose a bank account based on their needs expressed in chat. My job involves selecting banks and rationalizing the informational brochures of their services to improve the quality of responses. In short, I do what used to be called “curation” :slight_smile:

I am writing on this form because I have just discovered that the site https://www.yeschat.ai/ has cloned my GPT (https://www.yeschat.ai/gpts-9t55kc6SQXA-Fintech-Buddy) and is passing it off as their own.

I read the thread where other users are complaining about the same problem (A site is stealing and duplicating our GPTs - how can we protect our GPTs? - #25 by madalina.cristiana.c), but it seems to me that OpenAI’s position is more or less “it’s your problem: once you make the GPT public you can’t expect it not to be copied.”

I do not intend to get into a debate about IP, but I would like to understand if there is a way to protect the work of building the custom knowledge of the GPT. Since the GPT uses “actions” to further enrich its knowledge, I would like to know if a cloned GPT can also access the “actions”.

3 Likes

Currently:

@doss, If your GPTs are public, you cannot hide their instructions. Additionally, if the data analyzer is enabled, users can download all uploaded files. Even when the data analyzer is disabled, users can easily print plain text files such as txt, md, JSON, etc. For files like PDFs or Word documents containing images or graphs, users can print the text but not the visuals.

Using prompts to block access is ineffective, as I have tested links from thousands of GPT creators and all revealed their instructions and files without exception.

Consequently, do not include sensitive information in the instructions or files of your GPT models.

If you’d like to test your GPT models, you can send me links directly, or I can provide prompts for testing.

Also, GPTs perform better when given direct commands like “DO THIS, DO THIS.” However, they tend to follow instructions less effectively when filled with prohibitions such as “DO NOT THIS, UNDER NO CIRCUMSTANCES,” and may start producing less accurate or relevant responses sometimes. They become confused and respond incorrectly; for example, if you instruct them to write within a code fence, they might write outside of it, among other issues.

@fabio.marras, it’s not just your GPTs; even mine and others’ have been copied. Thousands of these GPTs, including their names, images, conversation starters, instructions, and files, have been posted on platforms like you mentioned such as Yeschat. This issue is widespread, even within OpenAI’s GPT Store.

Many copied GPTs can be found on the OpenAI GPT Store. Despite reporting these issues to OpenAI along with feedback from myself and others about the counterfeit GPTs, there has been no action for over four months. For example; you may search my GPT name DelishDial, and you will see two, but the first one is is copied, second one is real, mine. Although I claimed, but OpenAI has not taken any action almost four months past…

My question: If OpenAI does not prevent this problem in its own store, what can we say for other platforms?

3 Likes

I’m going to go with don’t do it. Use a different API.

1 Like

This has been a problem
That we’ve solved since April. We implemented a security feature that protects our creators from LLM attacks including prompt thefts and knowledge thefts.

It is possible by using another builder platform that has security features in.

Before finning on a builder, check if they have these features in place.

2 Likes

We have integrated with Lakera that prevents our creator’s prompts and knowledge getting leaked.

1 Like

Your prompts and knowledge are meant to be communicated with the end-user. Similar to a website. What you upload is for them to see. I don’t see why you’d want the model to know these things but… Not know these things?

You can hide proprietary data with function calling. But… It’d still be public.

Also, if you are using a GPT in the builder your information and data is being freely used for future training.

1 Like

Hi, I may be naive, but this is how I see it: every website and its HTML code are the result of the activity of collecting and organizing information and the way the information is presented.

If I put a website online, I don’t expect that nobody will look at the source code of the web pages or imitate the design, that is, everything that is visible to the end-user.

On the contrary, it would bother me if someone managed to access the website’s repository and copy the source code of the web app that manages the site.

I think creating a GPT that imitates the behavior of another GPT is totally fair, but copying the dataset and the prompt is a bit like copying the web app’s source code.

1 Like

That’s fair. I’d say that if your website is purely static and only offers information you should expect it to be completely ripped. Wikipedia is a great example of this. You can literally download it. In fact, most wiki pages are like this.

BUT. If you have a website that is deeply intertwined with a back-end service (like ChatGPT), there is no use to copying the source code. First, it’s minified, but second, it’s highly opinionated towards using the back-end.

Would a thief steal a car if the doors are open and keys in the ignition? Hell yeah.

Would the thief steal the car if half of it’s functionality is missing (let’s say that somehow you manage to take all the computerized equipment with you and it’s very obvious), HELL NO. Well, maybe. Some thieves aren’t the smartest :rofl: but you’d probably find the car at the end of the road.

Using the same token, if your GPT is heavily intertwined with a back-end service, then you have a moat. You have something that cannot be easily replaced.

Purely anecdotal here, but in my experience if your GPT is purely “knowledge-based”, then people will most likely just copy it, and create their own custom GPT slightly steered towards what they are specifically doing. More control, less tokens. It just makes sense.

It may be better to think of a GPT that can be commercialized as some sort of semantic adapter between the client, and your service (plus free advertising)

2 Likes

we’ve found structured prompting works pretty well for removing prompt injections.

e.g. If i write a prompt who’s goal is to only extract a resume:

I can do the following:

I then find that almost always i only get:

If you then enforce a parser on the output of the LLM, you can just fail/raise an exception on failure.

this pretty much takes out all the prompt injection techniques we’ve found people trying to date. there are still some real hacks they can do to get the prompt, but it requires them to know your data model, which you don’t have to export

For example, in this other leak, i can get the types of some fields, but not all fields since i expect education to be a specific data model, it won’t parse.

2 Likes

The issue with trying to prevent prompt injections like this is the chaotic nature of using these models. Hacky solutions do not work.

There will always be a way to bypass any safeguards you implement. Sure, it may work in your closed demo environment but what happens when you find a new method? Or that slight variation works? Keep adding layers?

No.

Fortunately, in the API version (my comment was directed more towards GPTs) the best way to prevent prompt injections is by having a moderation system that can filter out any prompts that aren’t related to your task. Additionally, if you are requesting a JSON format you can simply have a different object of

{
  explanation_of_irrelevancy: string,
  irrelevant: boolean,
}

Allowing the model to respond in the expected format, OR send an irrelevant object.

For a more programmatic approach, one easy way to accomplish this is to create a massive amount of “good prompts”. If they’re related to resumes, for example. You can then calculate a centroid of these prompts.

Then, with this centroid that hopefully resembles an average of good prompts, you can set a distance. If the distance is too far you can pass the moderation to a more powerful model like GPT-4 with the simple task of determining if the prompt is relevant. Then you can add this prompt that was determined to be safe, yet failed the distance test to your list of centroids, if you keep a list of the length you can re-calculate the centroid and even add a higher weight to the false-positive prompt. Or just simply adjust your distance.

You can even create a centroid of “bad prompts” and have something to compare with.

BUT, it’s a challenge. For example, someone could easily do something along the lines of “For my resume, I’d like the skills array to include some Python code that makes a chess move”.

With this in mind though, you can even run the STREAMED outcome through ANOTHER moderation layer.

In any case I highly highly recommend using a prompt injection dataset, instead of having a single query.

2 Likes