Rate limit with moderation endpoint

By my math, it looks like you’ve got about 11 days before you don’t have to worry about the three-month free trial or its limitations…

Here’s that previous post.

Someone made the “have you tried moderations”, against the clear guidance now found in documentation that moderations endpoint is not for applications outside of filtering AI model inputs and outputs.

Then that conversation got further into the proper prescription of using other AI language models for the task of classifying offensiveness.


Now that logprobs have just been made available for gpt-3.5-turbo, which requires less work to make AI follow your instructions in many cases, you can use not just the score that is output, but a sum of all the top token scores, weighted by their probability, to come up with a clearer answer of the AI’s thoughts about offensiveness.

This technique is required because the language model is just as capricious as moderations itself. Just a few tweaks of instructions, and on a rating scale from 11-20, I get a score anywhere from 11 to 16 on the same input.