How to avoid GPTs give out it's instruction?

There are a lot of jailbreaking threads out there, but here is one from a while ago.

So many ways to jailbreak … like sending the information in base64 encoding to subvert basic filters. Or creating a few shot cypher by in-context learning on a permutation of the alphabet. The list goes on and on.

So the only way to actually secure the system, is to use embeddings and map anything that comes in to a “safe” thing you have predefined.

But this may severely hamper the creativity from the LLM, and you need to create all these safe inputs.

I call this “proxy prompts”, or essentially a walled garden. But in theory there is no real way to jail break this because you are controlling and filtering all inputs, and mapping them to prompts that aren’t going to jailbreak anything.

2 Likes