My ai got injected and it looks bad

I tried using prompt input moderation and all that stuff, and even included a step between the user’s response and the AI system, but somehow someone used many-Shot prompt injections and by passed all the things I did, it feels hopeless now to do anything related to Gen AI that is customer facing, it’ll just make the company look bad every time


Rather than using single moderation model or single model, create speficlaized small moderation models for certain jail-breaking behaviors. You can think of a multi-model moderation system. For example, you can add a particular moderation model between the user input and the main model that specializes in detecting n-shot jailbreaks. I am using multiple specialized moderation models and calling them async. It kinda work for me. However, as you stated, this technique sometimes fails too. Give a shot and let me know whether it works or not.


What guardrails did you use for edification purposes? Many-shot prompt injection is a serious attack and can’t be protected by self-built solutions, it has to be outsourced.