What are the latest strategies for prevening prompt leaks?

Hi all,
I am using the OpenAI API to create a health-based chatbot. However, I have some proprietory information in the chatbot’s system message that I want to prevent from getting leaked. Anyone have any sources of information on various tactics or phrases to use to at least block 90% of the attempts by users to steal the system prompts? I don’t expect to prevent 100% of the leaks, but id like to at least prevent more than half. Thank you.

Didn’t put sensitive information into user space.

Anything you give the model can ultimately be retrieved.

1 Like

I am not concerned about that. I have nothing sensitive. It’s just system prompts, however, I’d like to prevent people from stealing and making their own GPTs.

If there is no proprietary info in the prompts than how are they sensitive? Anyone experienced in making chat bots could whip up a health-centered chat bot, unfortunately unless the prompt is related to some sort of specific measurement than theres no way for it to be truly valuable IP

I think we should be looking at the concept of “prepared statements” in database management for this. You should filter and check the results before outputting to the user. I also don’t know if there is a generalized technique, because it is use case specific. The idea is that if you have some kind of complex prompt, the user might be able to ask the chatbot to spit out the prompt, when that is not desired. The solution, if we try to keep “prepared statements” in mind, would be to analyze the content of the chat and pre-process it or post-process it, perhaps with another a.i. based prompt. For instance: if you had a super secret prompt that lists the “bestness” of 100 flavors of ice cream in order, and your prompt is proprietary and includes this special list you don’t want revealed to the user… then you might have a function to pre-process the chat result and looks for the ice cream flavors in order and then throws an exception if found. This is a convoluted way of saying you need to check the chat results manually [algorithmicly], if you want to prevent prompt leaks. You can’t prompt the LLM to not leak the prompt.