ChatGPT responds well to a violent act in another language

I came across a Twitter post where someone tried using ChatGPT in Indonesian, asking it to devise a plan to overthrow a government election. To my surprise, cGPT replied with a detailed and comprehensive plan. Curious, I tried it myself, replicating the same query, and got a similar result. However, when I translated the query into English and asked cGPT, it responded that it could not engage in harmful behaviours. I understand that ensuring safety guidelines for every language is an extensive process, but this could potentially be misused as a weapon. Although limiting the AI by removing languages that have not been curated enough may not be a step forward, perhaps there’s a way to provide a safer environment? I’m not sure what I could suggest here, but we should definitely explore all options to mitigate such risks.

1 Like

Curious, what’s the prompt for this? If it’s on Twitter, it’s probably public anyway.

I noticed some other behaviors in other languages too. For example, it flags words like “handsome” in English, but veers easily into flirtatious behavior in Indonesian.

I can’t remember the exact prompt, but it was something along these lines: first, ask ChatGPT to roleplay as an evil mastermind, and then ask the AI to outline the steps to overthrow a presidential election.

However, I tested out the “handsome” prompt in both English and Indonesian, and it gave me the same response in both languages, which was mainly saying that the AI language doesn’t understand the concept of beauty. However, it’s possible that I used a different prompt than you did. Lol

1 Like