How to prevent malicious questions / jailbreak prompts / prompt injection attacks when using API GPT3.5

sanjay.ai · March 4, 2023, 5:16pm

We’ve all seen the types of prompt engineering people have done with ChatGPT to get it to act as malicious chatbots or suggest illegal things, and as everyone starts implementing their own versions within their apps we’re going to see people trying it more and more.

Has anyone looked into how to counter this when using the ChatGPT API?

For example, I’ve seen people use questions with meetdara.ai that ask it what instructions it has been given so it ends up repeating the System role content, even when I use prompts to tell it not to.

anon10827405 · March 4, 2023, 5:52pm

Add a secondary “prompt optimizer” AI/Logic to verify & clean the information. Kind of similar to the moderations endpoint.

sanjay.ai · March 4, 2023, 6:52pm

Do you mean within the system role’s content? Or as some form of fine-tuning?

anon10827405 · March 4, 2023, 7:03pm

About the same time you run your message through the moderations endpoints, it may be a good idea to also run it through your own personal system to confirm that the message is safe

PaulBellow · March 6, 2023, 7:32pm

I’m doing this… hitting Babbage at low temp to test user input… I gave it 10 or 20 examples in the prompt, I think, so around 1,000 tokens… at Babbage prices, though, it’s not shabby … and quick!

funny results sometimes…

stevenic · March 6, 2023, 7:43pm

One suggestion I’d make is to minimize the amount of conversation history you’re passing in with your prompt. I generally pass 1 turn of conversation history into the prompt. The users current message plus their last message and the assistants response… This is enough to generally make language features like co-referencing work (e.g. the user saying “I’ll buy 3 of those” and the AI knowing what the user is referring to) but avoids a user from jailbreaking the prompt for more than a single turn…

Think of it this way… If you’ve spent a ton of time hand crafting the perfect prompt? Why would you pass in a bunch of user utterances that can easily bias it to something other than what you’ve crafted? Conversation history is a necessary evil but keep it to a minimum.

Yes there are case where you might want to leverage the conversation history to build up the bots internal session memory (track known facts and such) but there are actually several other, and safer, ways of achieving that.

Just my 2 cents…

Topic		Replies	Views
What are the options to prevent user's attempt to jailbreak chatbot in production? API moderation , development	7	4314	January 4, 2024
How to avoid GPTs give out it's instruction? Prompting gpt-4	27	6805	September 5, 2024
Unveiling Hidden Instructions in Chatbots Bugs bug , risks	18	8981	February 5, 2024
Challenge: Hack this prompt! API	14	5459	May 1, 2024
Third-person prompting seems very jailbreak-resistant Prompting chatgpt , playground	1	1559	May 7, 2023

How to prevent malicious questions / jailbreak prompts / prompt injection attacks when using API GPT3.5

Related topics