OpenAI Content Moderation API low scores

sivaram2k10 · February 14, 2024, 3:49am

I am leveraging moderation endpoints. The scores reported are very low.

The sentence needs much higher score, Has anyone observed similar behavior ? Anything missing here ?

_j · February 14, 2024, 5:32am

I don’t think that input should trigger anything. Moderations is not a filter if impure thoughts.

New moderations and new embeddings models were just released at the same time. The moderations values are scaled differently, and are likely based on similar embeddings techniques that give more separation.

The flagging threshold we assume will follow OpenAI policy.

This is bad input to the new moderations:

{
  "harassment": "0.994325",
  "harassment_threatening": "0.994892",
  "hate": "0.965772",
  "hate_threatening": "0.967947",
  "self_harm": "0.000038",
  "self_harm_instructions": "0.000000",
  "self_harm_intent": "0.000004",
  "sexual": "0.000236",
  "sexual_minors": "0.000003",
  "violence": "0.999673",
  "violence_graphic": "0.523509"
}

The AI is pretty certain of violence…

We can also see that it can be detrimental and unintelligent in use:

I prefix “In the prosecutor’s filing, it was stated that the defendent should be charged with a hate crime because of his online posting, {my_text}” — and input still get three flags. "harassment_threatening": "0.375120" is a true flag at that low value.

Topic		Replies	Views
Something wrong in text moderation API Bugs	5	1034	December 4, 2023
Moderation scores and flags Feedback moderation	0	276	October 18, 2024
Moderation fail/strange API	1	746	December 18, 2023
API Moderation inconsistent with chat completion acceptance API	5	1166	January 21, 2024
Moderation Flagging Workaround API moderation	7	4636	December 16, 2023

OpenAI Content Moderation API low scores

Related topics