How can you protect your GPT?

Perfection has a price.

But with filtering each input, you can at least limit some of the attempts.

I won’t list all the techniques here, but one common one is sending base64 encoded text over. So this will bypass your keyword filtering, or even classifier (assuming plain text used in training), but will jailbreak your system. Because the LLM understands base64.

So you need a base64 detector … see how it explodes?

The attack surface area of LLM’s is M >A> S>S>I>V>E>, so go in with the attitude that a determined attacker will hack into your system.

So don’t hang your “patented golden prompts” out there, they will be stolen eventually.

But honestly, 100% security is only achievable by ruining the experience, but it can be done if you completely isolate the user from the LLM.

3 Likes