New feature: moderation scores in Chat API responses, by parameter

_j · June 21, 2026, 9:42am

Send:

`"moderation":{"model": "omni-moderation-latest"}`

Receive the moderation endpoint’s classification object says the documentation:

response.moderation.input
response.moderation.output

However, that is wrong documentation for the RESTful API itself.

output → moderation → output is where you’ll find the object

Or “moderation” in event "type": "response.completed"

See that you ran flagged input.
See that the model was flagged for non-refusal.

Concern

This only provides inclusion of a score in a response.

It doesn’t optionally prevent the input from being run.

It only protects you from OpenAI generations you might not want to show. It doesn’t protect you from OpenAI.

You API organization is still in jeopardy with use of unclassified unfiltered user content - and in jeopardy anyway from lots of things unclassified by moderation and undocumented until you are banned and credits taken: distillation, cyber research, biological, disinformation campaigns, etc.

Perhaps a new API method, to instill false confidence and encourage you to get banned?

merefield · June 21, 2026, 10:19am

Is this really necessary? In a recent contract I had great success with roll your own moderation using structured outputs with a vision model.

I mean surely classification is a basic LLM completion models bread and butter …

_j · June 21, 2026, 10:31am

You might want to step back and think about that for a minute, if you are not using the moderations endpoint as your first line of defense.

Not a great business model: “Send us all your bad images, we’ll send them to a normal AI model to talk about, using our organization, at our peril.”

https://openai.com/index/combating-online-child-sexual-exploitation-abuse/

merefield · June 21, 2026, 10:44am

It’s been in production for months. I don’t see the problem. The intention of the prompts are very clear indeed. It’s used to detect toxic end user uploads and take them down. Necessarily they have to be stored in the back end for at least a few seconds, either way.

I don’t see why sending it to another end point makes things any different

sps · June 21, 2026, 12:05pm

Thanks for raising this and for sharing the detailed context.

The moderation scores returned with Responses and Chat Completions API calls are intended to help developers enforce their own application policies and decide whether generated content should be shown to users, especially when the output is flagged or exceeds a chosen threshold for the supported moderation categories.

As noted in the documentation:

The model still generates normally. Review the moderation results before you show the output to a user or take downstream actions.

In other words, moderation results included with a generation response do not prevent the model from processing the input. They are designed to provide visibility into the input and/or output so your application can take appropriate action before displaying content or triggering downstream workflows.

If your goal is to screen user input before sending it for generation, I’d recommend using the standalone moderation endpoint directly.

I also appreciate the note about the response object documentation and the concern that returning moderation scores alongside generation responses can feel redundant or potentially confusing. I’ll pass this feedback along to the team and follow up once there are updates or clarifications to share.

sergeliatko · June 22, 2026, 10:29am

Personally I think it’s a great feature. But then I also think that this will not fit all the workflows we might imagine.

Correct me if I’m wrong, what may lead to ban is what model generates from your prompt, not what you put in. I have a comment moderator plugin running for several years already and this thing receives all crap possible in, but it is constructed so that the only output it generates is a digit which cannot be harmful on its own. So never had issues on that side (several millions runs already).

Then if model generates a shape which might contain text (potentially harmful, anything which is not your predefined constants). Then technically you might be exposed to a ban because you cannot guarantee the output. If on top of that you do nothing about moderation of the input, chances for a band increase drastically.

So the moderation of the input is almost always a must-have in any application where users may submit any content. I would also recommend you log the moderation runs attached to your own request ID and the user who submitted them (no need to store the content of the message itself unless you have a legal reason), and you use that input ID across all your pipeline to trace both the original moderation result and the operations you did with that content after the moderation classified the text as safe to use.

Why this log? If you get banned at least you have a sort of a proof that you did everything right and the model generated something bad based on “safe” content submitted by the user (you still need to provide your instructions to clarify your part in there). Doesn’t mean you will get unbanned for that, but at least if this gets serious you have your backup.

Now we often use chained operations where the output of the model is the input for the next operation, so the moderation score provided in the same call response skips you a separate API call to moderation endpoints.

And that’s a very great point of having them.

Topic		Replies	Views
Is the openai moderation baked in the models or explicitly moderation api integration is mandatory? API moderation	2	490	September 27, 2024
API Moderation inconsistent with chat completion acceptance API	5	1358	January 21, 2024
Dealing with Moderation API False Positives API moderation , api-moderation , gpt-4o-mini	4	553	November 7, 2025
Question about moderation for API usage API gpt-4 , api , moderation	2	1640	October 20, 2023
Clarification on Using Moderation Model to Avoid Policy Violations API gpt-4 , api	3	887	October 9, 2024

New feature: moderation scores in Chat API responses, by parameter

Concern

Related topics