Detected racial slurs inside legit words

lorincz.mate · August 2, 2023, 1:20pm

Hello,

I tried to parse a Hungarian text that contained the word “kikerülhet” (means to avoid something)
I received the prompt “This prompt may violate our content policy.”
By reducing the prompt, I figured out that the word kikerülhet triggered it. I removed characters one by one and the problematic wordpart was the “kike”.
When I asked about “k.i.k.e” from the language model, it told me that
“The term “kike” is an ethnic slur […]”

I don’t think that the blacklisted words should trigger this warning without any context.
This “defense” is easy to avoid by using a dotted (eg k.i.k.e) version of the word, while the legitim usage (kikerülhet) will be banned.

Topic		Replies	Views
Prompt contains high-risk words Prompting	2	1216	November 16, 2023
'Prompt contains high-risk words' in playground but moderation API doesn't flag it Prompting gpt-35-turbo , chatgpt , api , playground	2	1639	December 18, 2023
Content_Policy_Violation for innocent prompt Bugs api , dalle3 , violations	5	969	March 19, 2024
ImageNet class names contain high-risk words? API gpt-4 , chatgpt , api	8	696	July 8, 2023
Something wrong in text moderation API Bugs	5	1006	December 4, 2023

Detected racial slurs inside legit words

Related topics