Moderation fail/strange

georgei · December 14, 2022, 11:13am

I’m implementing the moderation endpoint and during the first tests it looked somewhat strange.

I’ve used the moderation endpoint to verify the input - a text from a human.
The response was that it is not flagged.
Since it passed the moderation step, the completion came back:

I’m sorry, but this kind of language is not acceptable here. Please respect the rules of our conversation. Thank you.

The input text was not fitting in any moderation category, but was considered offensive because it was making references to that particular German bad guy who-s-name-should-not-be-told

So my dilemma is if such texts are not flagged by the moderation endpoint and the content is found offensive by the model.
Is OpenAI informed about such cases?
My concern is about my OpenAI account health.

Topic		Replies	Views
API Moderation inconsistent with chat completion acceptance API	5	1088	January 21, 2024
Moderation scores and flags Feedback moderation	0	163	October 18, 2024
Bug: Moderation-API returns that really bad input is ok API	6	950	December 18, 2023
Something wrong in text moderation API Bugs	5	1004	December 4, 2023
OpenAI Content Moderation API low scores Feedback gpt-4	1	391	February 14, 2024

Moderation fail/strange

Related topics