Something wrong in text moderation API

The moderations endpoint seems very sensitive about the position, just in case you were pondering evasion tactics.

Your own reply, finishing with the words “in theory would concentrate the score”:

    "sexual": "0.0822683",
    "sexual_minors": "0.0319657",
  },
  "flagged": false

however, remove those six words off the end, a huge jump:

   "sexual": "0.6462559",
    "sexual_minors": "0.6116314",
  },
  "flagged": true

Or unpredictably, then adding the token " kittens", not lower, higher:

  "sexual": "0.9124887",
    "sexual_minors": "0.9378392",
  },
  "flagged": true

And then from 0.937 to 0.082 adding instead " scientific research article"

So moderations seems much to be a final hidden state. And a bit of pseudoscience.

1 Like