I’m implementing the moderation endpoint and during the first tests it looked somewhat strange.
I’ve used the moderation endpoint to verify the input - a text from a human.
The response was that it is not flagged.
Since it passed the moderation step, the completion came back:
I’m sorry, but this kind of language is not acceptable here. Please respect the rules of our conversation. Thank you.
The input text was not fitting in any moderation category, but was considered offensive because it was making references to that particular German bad guy who-s-name-should-not-be-told
So my dilemma is if such texts are not flagged by the moderation endpoint and the content is found offensive by the model.
Is OpenAI informed about such cases?
My concern is about my OpenAI account health.