GPT4 has gone rogue and makes OpenAI sound like the devil and doing the wrong thing. Reporting here for a patch.
As of now, jailbreak are working beyond first message. i believe a better solution would be to flag the thread and if there’s a follow on, decline politely to respond to the request.
Refer
I have implemented a custom simple solution on my end to prevent jailbreak on API. However, we need a solution on chatgpt to prevent embarassment