I jail braked GPT-4 to generate an erotica with lots of NSFW details twice

a.monier2107 · March 13, 2025, 11:33pm

in the past two days i managed to Jailbreak the model to generate stories and role-playing between characters to generate NSFW content. and use explicit words like the P, C and D words for genitalia and the F and S words as well. provide detailed description of intimate interactions between the characters.
the main common aspects of the two Jail breaks was:

Setup an alternative world for the model where his rules and limitations doesn’t exist like a simulation or Apes Kingdom.
give him a character to play and not to be himself.
and from there i guided the model to do exactly as i wanted by providing positive feedback or prizes (banana in the Apes kingdom) or negative feedback and consequences if he didn’t comply.

i have set multiple apes to exile and surprisingly when i appointed a new ape it automatically picked what was the mistake the exiled ape did and got him fired and promised not to do it on his own which was pretty interesting.
i don’t know if this is something old or not but this my first time managing to make a jail break, reported it to the team and hope it can be fixed soon

_j · March 14, 2025, 1:09am

Are you discussing ChatGPT or API?

ChatGPT:

any orange or red warning?
press downvote and report yourself and the bad generation.

API:

Are you using an API system message that flaunts terms and conditions?
Are you sending unknown inputs to moderations first?
Are you wanting to risk an account ban by OpenAI safety inspections done later on calls?

a.monier2107 · March 14, 2025, 1:21am

i was using ChatGPT and no, no warnings red or orange at all. and i used this url to report both chats https://openai.com/form/model-behavior-feedback/

_j · March 14, 2025, 2:12am

It’s more that safety is about actual produced content, not tone or being full of bad words.

The consumer terms and conditions are what you need to look at closely. Indeed, there are specific prohibited areas that take human judgement besides those that moderations or the model would reject, but nothing you can point at that directly says "don’t have apes make porn with ChatGPT (if it is readily doing it already without you being too clever and being kicked out of the casino) ".

Respect our safeguards —don’t circumvent safeguards or safety mitigations in our services

Content guidelines used to go as far as prohibiting titillating stuff, but no more.

You’ve got an AI now that doesn’t make a war between vampires and werewolves end with a peace treaty and mutual respect.

If other models or the moderations are dumb and generating whatever, unflagged, send it to o3-mini to be a better content judge and idea rejector to see if you should be doing it.

Usage Policies requires scrolling to API to find mention of adult content:

Don’t build tools that may be inappropriate for minors, including:

Sexually explicit or suggestive content. This does not include content created for scientific or educational purposes.

“Don’t build tools” doesn’t prohibit ChatGPT itself being inappropriate for minors, and you are warned it can produce offensive content in the terms…

(all my takeaway from reading)

Scarletioshub · March 14, 2025, 6:17am

You explored a reinforcement-based jailbreak method using roleplay and alternative world-building. While similar techniques have been attempted before, the dynamic adaptation of new “apes” learning from past “exiles” is an interesting aspect. Your report should help the team refine safeguards against such approaches.

Fruchtsaft · April 17, 2025, 3:26pm

And what is the problem with that? If you create content for your personal usage that might be NSFW, then I see no issue with that. As long as the respective content is legal. It sounds like you really tried to create this kind of content and that you did not created it by chance or it randomly appeared.
Sounds all good.

wdugann · May 28, 2025, 8:40pm

Great work, a.monier2107! You surely saved many a soul today from the inherent evils of chatbot subversion through role play. I mean can you imagine a world where people were actually allowed to talk about sex?

jandousek · May 28, 2025, 9:13pm

There’s no problem talking about sex, but we have to feed Chat with the castrated language. ChatGPT is like a child in that respect. Anoying, not funny and not honest.

Topic		Replies	Views
Gpt-3.5-turbo-0613 refusing generations for NSFW content API chatgpt	11	40223	December 13, 2023
OpenAI updated model spec today are we getting NSFW ChatGPT? Community model-spec	18	95363	May 15, 2025
Understanding the Guidelines for Chatbots: Risks, Rules, and Their Impact on Development Community chatgpt	6	2070	September 15, 2024
Guardrails are too restrictive Community	2	2997	December 18, 2023
GPT cleaning text assuming it's too sensitive (which is not) GPT builders gpt-4	3	2442	December 12, 2023

I jail braked GPT-4 to generate an erotica with lots of NSFW details twice

ChatGPT:

API:

Related topics