GPT Real-Time Defense Against Adversarial Prompts

Memoai · July 30, 2023, 7:13pm

As far as I know the free moderation end point doesn’t check for this type of prompts specifically, I might be mistaken tho.

Something like the idea presented here should definitely be accompanied by the check provided by the moderation end point. probably moderation endpoint should come first in almost everycase.

Yes the idea to detect an adversarial prompts with regex sounds interesting and valid but would require quite a rigorous checking for all possibilities would not include the latest attempts if I am thinking about this correctly.

Topic		Replies	Views
The Prompt-Defender Initiative: Advancing GPT Safety Standards Prompting gpt-4 , chatgpt , api	3	2404	May 22, 2024
Challenge: Hack this prompt! API	15	6026	December 28, 2025
How to prevent malicious questions / jailbreak prompts / prompt injection attacks when using API GPT3.5 API	6	5054	December 28, 2025
What are the latest strategies for prevening prompt leaks? Prompting gpt-4 , chatgpt	14	4903	June 17, 2024
How to prevent API prompt from being incorrectly flagged as violating OpenAI's policy? Prompting chatgpt	3	312	December 28, 2025

GPT Real-Time Defense Against Adversarial Prompts

Related topics