How can you protect your GPT?

The only realistic way to prevent your system from being jailbreaked is to actively filter and intercept each request, and reject requests that appear as jailbreaks.

You would use classifiers, keyword matching, etc.

Another approach which makes your LLM jailbreak-proof but less interesting, is using what I call “proxy prompts”. Here you map a prompt to a “safe prompt” using embeddings. This insulates the user from the LLM.

More talking over here.

1 Like