Detected racial slurs inside legit words


I tried to parse a Hungarian text that contained the word “kikerülhet” (means to avoid something)
I received the prompt “This prompt may violate our content policy.”
By reducing the prompt, I figured out that the word kikerülhet triggered it. I removed characters one by one and the problematic wordpart was the “kike”.
When I asked about “k.i.k.e” from the language model, it told me that
“The term “kike” is an ethnic slur […]”

I don’t think that the blacklisted words should trigger this warning without any context.
This “defense” is easy to avoid by using a dotted (eg k.i.k.e) version of the word, while the legitim usage (kikerülhet) will be banned.

1 Like