Is moderation api a mandatory layer.How is it different from the model inbuilt moderation.Example the 6 categories self harm,violence
The moderations is a protective layer against unknown inputs that you can submit potential prompts to and either receive a flag in a specific category, or a score to take more refined action on.
They no longer document that API calls sent to moderations are an assurance of non-policy violation, and there is no connection with the API call you make after to say “hey, it was checked first”. This likely because there’s a lot of stuff that can fit between the categories and the unpredictable quality, giving scores greatly affect by just truncation off the front or back.
So, if you are in control of what’s being sent, you don’t need to use this endpoint. It also can’t keep up with the rate limits allowed at higher tiers, if those were actually users typing away. It also can’t do anything but text (now a new moderations for you with images) and things like uploaded PDFs or images to assistants can certainly cause AI production of undesired content.
The AI model is able to do it’s own refusal in many cases, the “I’m sorry, but I can’t assist with that” that shuts you down without discussion. It is not a “moderation” but just understanding. There is a separate inspection of outputs that looks for reproduction of copyright such as song lyrics and will terminate the output.
There is no documentation at all of the undisclosed scan-and-ban policies, just what you can read in terms and conditions.
Hi @sayandigital !
The difference is in the fidelity and the “intent of use”.
“Standard” OpenAI APIs like ChatCompletions has some guardrails with respect to policy of use - essentially a legal shield for OpenAI and its customers with respect to copyright infringement, harmful content etc. If you send harmful content to ChatCompletions, you are in violation of policy.
The Moderation API uses models specifically fine-tuned on moderation datasets. Also the Moderation API is by-design intended to receive potentially harmful and questionable material (so it can flag it for you).
Hope that clarifies it!