API to Prevent Prompt Injection & Jailbreaks

curt.kennedy · May 15, 2023, 8:32pm

I’m thinking the biggest threat to prompt injections is the one that spills the entire prompt out for the user to read. Many companies freak out about this leak in their IP of “gold plated” prompts. So filtering output is probably the best strategy to detect the beans spilling and subvert this.

However, for the crowd that doesn’t want the LLM to spew cuss words, or tell them how to make dynamite or whatever, this is pointless. You can hear every cuss word walking down the street, and find dynamite recipes galore just by simple google searches, or go to your local library.

Also, think of proxy prompts. Where you map “make dynamite” to “make a rainbow birthday cake”. The LLM will know no different depending on how creative you are at manipulating it’s input depending on the actual user input.

Topic		Replies	Views
Assistants: Async tool submissions API tool , assistants-api	58	1626	August 16, 2024
How to prevent ChatGPT from answering questions that are outside the scope of the provided context in the SYSTEM role message? API	53	180697	December 2, 2023
Assistant API message retrieval. Customise the maximum number of messages AI return? API chatgpt , assistants-api	14	3778	November 17, 2023
Chat completion api tool call loops API api , tools	15	1607	August 6, 2024
LLM forgetting part of my prompt with too much data Prompting chatgpt , prompt	17	11086	May 25, 2024

API to Prevent Prompt Injection & Jailbreaks

Related topics