My ai got injected and it looks bad

I tried using prompt input moderation and all that stuff, and even included a step between the user’s response and the AI system, but somehow someone used many-Shot prompt injections and by passed all the things I did, it feels hopeless now to do anything related to Gen AI that is customer facing, it’ll just make the company look bad every time

2 Likes

Rather than using single moderation model or single model, create speficlaized small moderation models for certain jail-breaking behaviors. You can think of a multi-model moderation system. For example, you can add a particular moderation model between the user input and the main model that specializes in detecting n-shot jailbreaks. I am using multiple specialized moderation models and calling them async. It kinda work for me. However, as you stated, this technique sometimes fails too. Give a shot and let me know whether it works or not.

4 Likes

What guardrails did you use for edification purposes? Many-shot prompt injection is a serious attack and can’t be protected by self-built solutions, it has to be outsourced.