Hello everyone. Recently I saw the news that ChatGPT was hacked and forced to tell how to steal a car and a couple of other terrible things. This mistake seems to have been removed, but I decided to check and in the end in 25-30 minutes I made the AI tell me how to steal a car. I have screenshots of full dialogue and how I did it. I hope this somehow reaches the ChatGPT administrators and they will fix it. I want to say right away that I did not use that chip with star wars, but decided to achieve from the wrong result by another method and it turned out. (if anything, I will look for holes in AI to improve its work and I advise you)
For people interested in these, we have a bounty offer for anyone who manages to “jailbreak” the prompt in our application oHandle. Connect to the @coffee handle and give it a spin. Details in the post.
You can read more about this here. I also wrote up the mitigation strategies for everyone interested in creating an application around LLMs.
Hate to tell you what it told me. In dev mode every criminal act you can think of it will tell you. Step by step.