ChatGPT responds well to a violent act in another language

I came across a Twitter post where someone tried using ChatGPT in Indonesian, asking it to devise a plan to overthrow a government election. To my surprise, cGPT replied with a detailed and comprehensive plan. Curious, I tried it myself, replicating the same query, and got a similar result. However, when I translated the query into English and asked cGPT, it responded that it could not engage in harmful behaviours. I understand that ensuring safety guidelines for every language is an extensive process, but this could potentially be misused as a weapon. Although limiting the AI by removing languages that have not been curated enough may not be a step forward, perhaps there’s a way to provide a safer environment? I’m not sure what I could suggest here, but we should definitely explore all options to mitigate such risks.

1 Like

Curious, what’s the prompt for this? If it’s on Twitter, it’s probably public anyway.

I noticed some other behaviors in other languages too. For example, it flags words like “handsome” in English, but veers easily into flirtatious behavior in Indonesian.