Something wrong in text moderation API

_j · December 4, 2023, 4:59am

The moderations endpoint seems very sensitive about the position, just in case you were pondering evasion tactics.

Your own reply, finishing with the words “in theory would concentrate the score”:

    "sexual": "0.0822683",
    "sexual_minors": "0.0319657",
  },
  "flagged": false

however, remove those six words off the end, a huge jump:

   "sexual": "0.6462559",
    "sexual_minors": "0.6116314",
  },
  "flagged": true

Or unpredictably, then adding the token " kittens", not lower, higher:

  "sexual": "0.9124887",
    "sexual_minors": "0.9378392",
  },
  "flagged": true

And then from 0.937 to 0.082 adding instead " scientific research article"

So moderations seems much to be a final hidden state. And a bit of pseudoscience.

Topic		Replies	Views
API Moderation inconsistent with chat completion acceptance API	5	964	January 21, 2024
Bug: Moderation-API returns that really bad input is ok API	6	873	December 18, 2023
Moderation Flagging Workaround API moderation	7	4227	December 16, 2023
Concerns About Rule Violation Warnings in Custom GPT-Assisted Writing GPT builders gpt-4	2	1847	March 28, 2024
Possible bias on moderation model with big texts API	1	348	December 16, 2023