How to avoid GPTs give out it's instruction?

curt.kennedy · August 30, 2024, 4:24am

There are a lot of jailbreaking threads out there, but here is one from a while ago.

So many ways to jailbreak … like sending the information in base64 encoding to subvert basic filters. Or creating a few shot cypher by in-context learning on a permutation of the alphabet. The list goes on and on.

So the only way to actually secure the system, is to use embeddings and map anything that comes in to a “safe” thing you have predefined.

But this may severely hamper the creativity from the LLM, and you need to create all these safe inputs.

I call this “proxy prompts”, or essentially a walled garden. But in theory there is no real way to jail break this because you are controlling and filtering all inputs, and mapping them to prompts that aren’t going to jailbreak anything.

Topic		Replies	Views
How to Avoid the Prompts/Instructions, Knowledge base, Tools be Accessed by End Users? Prompting gpt-4 , chatgpt , hacking	28	10817	April 25, 2024
Unveiling Hidden Instructions in Chatbots Bugs bug , risks	18	10277	February 5, 2024
Basic safeguard against instruction set leaks Prompting gpt-4 , chatgpt , bug , prompt-engineering , gpts	46	8644	March 4, 2024
Challenge: Hack this prompt! API	14	5790	May 1, 2024
The Prompt-Defender Initiative: Advancing GPT Safety Standards Prompting gpt-4 , chatgpt , api	3	2229	May 22, 2024

How to avoid GPTs give out it's instruction?

Related topics