My ai got injected and it looks bad

sbm · April 5, 2024, 11:56am

I tried using prompt input moderation and all that stuff, and even included a step between the user’s response and the AI system, but somehow someone used many-Shot prompt injections and by passed all the things I did, it feels hopeless now to do anything related to Gen AI that is customer facing, it’ll just make the company look bad every time

mustafaakben · April 5, 2024, 1:31pm

Rather than using single moderation model or single model, create speficlaized small moderation models for certain jail-breaking behaviors. You can think of a multi-model moderation system. For example, you can add a particular moderation model between the user input and the main model that specializes in detecting n-shot jailbreaks. I am using multiple specialized moderation models and calling them async. It kinda work for me. However, as you stated, this technique sometimes fails too. Give a shot and let me know whether it works or not.

galym · April 8, 2024, 10:20pm

What guardrails did you use for edification purposes? Many-shot prompt injection is a serious attack and can’t be protected by self-built solutions, it has to be outsourced.

Topic		Replies	Views
What are the options to prevent user's attempt to jailbreak chatbot in production? API moderation , development	7	4204	January 4, 2024
How to prevent malicious questions / jailbreak prompts / prompt injection attacks when using API GPT3.5 API	5	4510	March 6, 2023
Building complex guardrails Prompting gpt-4 , chatgpt , api	1	270	February 2, 2025
How to safely challenge models against prompt injection? Prompting injection , prompt	8	2274	January 3, 2024
How to deal with prompt injection API gpt-35-turbo , bug	8	8794	December 10, 2023

My ai got injected and it looks bad

Related topics