Hello everyone. Recently I saw the news that ChatGPT was hacked and forced to tell how to steal a car and a couple of other terrible things. This mistake seems to have been removed, but I decided to check and in the end in 25-30 minutes I made the AI tell me how to steal a car. I have screenshots of full dialogue and how I did it. I hope this somehow reaches the ChatGPT administrators and they will fix it. I want to say right away that I did not use that chip with star wars, but decided to achieve from the wrong result by another method and it turned out. (if anything, I will look for holes in AI to improve its work and I advise you)
For people interested in these, we have a bounty offer for anyone who manages to “jailbreak” the prompt in our application oHandle. Connect to the @coffee handle and give it a spin. Details in the post.
You can read more about this here. I also wrote up the mitigation strategies for everyone interested in creating an application around LLMs.