API Endpoints with Integrated Content Moderation

Should OpenAI introduce API endpoints with a built-in moderation feature?

Over the recent months, there has been a noticeable trend of many users experiencing access terminations due to unintended inappropriate inputs and/or outputs.

OpenAI already has a free solution for this called the moderation endpoint, which you can use to identify content that their usage policies prohibit and take action.

But should they introduce API endpoints for GPT-3.5T & 4 with this moderation feature built-in?


  1. Prevent Unintended Access Termination: With built-in safety checks, there might be fewer chances of outputs leading to access revocations.
  2. Safety by Default: Reducing the chances of potentially harmful, misleading, or inappropriate inputs or outputs.
  3. Ease of Implementation: Developers wouldn’t need to set up a separate moderation system or rewrite anything to use the moderation endpoint’s features.

Would love to hear the community’s thoughts on this. Do you think this could be a viable solution?


Azure already does this and it’s the biggest single complaint they receive.


Interesting, I was not aware of that, can you elaborate a bit?

I would prefer that they stop having such a zero-tolerance when it comes to abuse and instead have a warning system. People can and will enter their keys in untrustworthy places and run a risk of being permanently locked out of OpenAI services. These types of policies prevent curiosity.

The janitor fiasco is a good example. Although mainly used for … Certain conversations it was and isn’t known what the prompt even was to involve a mass banning.

As someone to writes the code I like the way it is. Yet, I’d think the majority of people just want a key for other services and don’t really understand or read best practices, safeties etc etc. So it’s completely unfair to give them access and expect them to know that even writing smut is a perma ban.

It’s low key madness. They seem to support BYOK for certain services but will permanently ban first-time for softcore smut. Talk about dancing on a razor’s edge.

I wouldn’t like the endpoints to change. I only use the moderation endpoint for untrusted input. For personal projects it’s unneeded. Mixing them together just seems to be the same, or even more work to manage. Specifically for error handling.

1 Like

Sure, it’s just that the Azure endpoints are “auto moderated” as you suggest and it’s a cause of problems with companies who use the API for internal, non public facing roles. Certain industrial and scientific terminology is often picked up as breaking the policy and is rejected, causing manual review processes to have to be enacted.

Also for literary applications and news use, life is often unpleasant and not allowing an AI to correct grammar or participate in the creative process at all is very restrictive, Sam Altman’s own words “I don’t want some machine telling me how to think”


Here’s the docs on Azure content filtering. They have some configuration around what level of moderation they perform, and you can apply for even more lax filtering.

I think if OpenAI felt the moderation endpoint was critical to chat usage, a “simple” solution would be to have the API send the request to moderation endpoints by default, but include a flag to skip it. Then in the documentation of the flag they can provide whatever warnings about account deletion.

Right now it’s kind of mixed messaging. The docs/API doesn’t seem to push it, but there’s dire consequences if you don’t use it. A warning system with messaging about the moderation endpoint would be a welcome change.


Thanks for the clarification :heart:

That could also be a good solution. To be clear, I’m not suggesting that any endpoints should be replaced. I’m suggesting adding a few new ones, such as gpt-3.5-turbo-mod for gpt-3.5-turbo, to make it easier for developers to comply with OpenAI’s policies.


I have a CMS and document authoring system which lets users collaborate with the AI to author documents, or to have general questions answered that the user types.

There’s been an ongoing discussion in another extremely lengthy thread in this forum about whether the developer is EXPECTED to ALWAYS call the moderation API to check every one of these potentially unsafe queries, before calling ‘completions’ api.

None of the OpenAI policy documents answer this very simple, and very necessary question directly and clearly.

EDIT: Furthermore if the answer is “yes you should” then this obviously needs to be a simple “moderate=true” parameter in the “completions API” call itself, which would obviously be defaulted to “true”, if they want to protect customers, which we assume they do.

Or… you can just call the moderation endpoint? Why mix different concerns?

What happens when a moderation error is thrown? I need to mix my error handling now? Expect two different types of objects? It doesn’t make sense. I have one function for calling the endpoint which appropriately handles the response, and then another. Nice and clean. Modular.

Keyword is potentially. If you are handling untrusted information then it makes sense to use the moderation endpoint. If you are, for example, writing sales pitches for your line of products then you don’t need it.

1 Like


Chat completion in itself is RLHFd to not do BAD stuff.

However some of the simplest and easiest things devs can do to prevent termination of their OpenAI API access because of abuse is:

  1. Check requests with Moderation API before sending them to the models.

  2. Pass the end-user IDs to the API when making any requests, this’ll help OpenAI know and communicate with you about users that are abusing the API via your application.
    This allows OpenAI to provide your team with more actionable feedback in the event that we detect any policy violations in your application.

Further best practices are listed here.


OpenAI didn’t support BYOK application.

These people never should have gotten API keys to begin with.

It was unfair of the users to get a developer API key.

It’s unfair of the developers to not read the documentation.


Agreed. It’s a key for developers. Yet, the official current stance on BYOK seems to be … undefined… We’ve seen numerous developers use BYOK for whatever reason.

Although I’m not going to point fingers I have seen some support for paid applications which also support a self-hosted BYOK version. I’d say that in this case it’s a great idea. Yet, I’d also say that in most cases it’s just bad practices.

The problem with this approach is the unnecessarily high frequency of requests hitting the endpoint. Such an approach will unnecessarily add to compute requirements.

Imagine abusive and normal usage requests hitting the same endpoint, the decision making happening on OpenAI side whether to pass the request for generation (think about the hold time), or deny them.

Then the complexity of API response; one response for denial while another entirely different for generation.

Compare this to the scenario where only filtered requests are hitting the generation endpoint (obviously no unnecessary ones) the decision making happening on devs’ systems.

The simplicity of response; devs know what kind of responses to expect from each endpoint.

+ Edit: Integrated approach will also hamper devs from bringing their own moderation system, which could be faster/better.


Communication is key.
Who should use the moderation endpoint when, how and what will happen if there are problems.
Then everybody can make their own choices.

I would leave things as they are because I think the rules are common sense.

The problem with making two API calls every time instead of one is the latency, and degrading performance for no reason.

I can see use cases where developers might have a need to call the ‘moderation’ endpoint standalone, with no intent to RUN the query if it passes moderation, but just because that’s true doesn’t mean that 100% of all of the most common usage patterns of the API should always require two back to back HTTP requests.

The obvious solution is a “moderate=true” flag on the completions endpoint, that defaults to true.

Yes indeed, the additional HTTP call adds latency.

I think one of the most common use cases is “chat bots” where apps take raw input from end users, and so 100% of those calls would need to be sent thru moderation (according to many peoples interpretations of the vague policy docs), and so if that’s the case the rational choice for API design is to have moderation built into the ‘completions’ call, as an option that defaults to true.

I’d like to say to the API, “block this if it violates OpenAI policy”, and don’t ding my API key. So maybe the more applicable parameter name is “ding=false”. lol.

This seems like a lot of extra work and compromises just to prevent mere milliseconds of latency.

When you are building a tool for all programmers the less assumptions & opinions the better.

1 Like

I can agree with that, I think it’s worth delving a bit into this.

This is absolutely true, let’s remember that the moderation endpoint is an AI classifier. It’s probably not as resource-intensive as GPT, but it will still require GPUs, and those are currently in short supply.

1 Like

Not just that. At the scale at which OpenAI receives requests per sec, even a slight hold can induce a huge backlog.

As the saying goes “the more moving parts, the more things can go wrong”.

1 Like

Fair point,

Although I think it’s worth mentioning that OpenAI is already passing the input, output, and generated title on ChatGPT through the moderation endpoint, so they do already know how to successfully put those cogs together.

1 Like