Moderations Best Practises For Consumer Apps

I’m building a consumer app and was wondering what are the best practices for applying the Moderations API.

Running moderation and completion requests in sequence will block any violating prompts, but as a result the UX will suffer. Most requests (which are compliant) will need an additional ± 2s (depending on the length and network speed) to resolve.

Looking at the Playground and ChatGPT network requests we can see that they send the completion and moderation requests simultaneously. The moderation request will flag a prompt sooner than the completion will resolve.

So simultaneously (async) is definitely the way to go in terms of UX.

OpenAI Support told me: “Sending any unfiltered prompts directly to the completion API with enough policy violations can result in account suspension or termination.”

Here are my questions:

  1. How many violations until I should ban my end-user’s account?
  2. How many violations until my developer account get’s terminated? Does the user id get factored in that decision (see: OpenAI API) ?
  3. How do you test violating prompts without getting terminated?
  4. gpt-3.5-turbo has built in moderations, do I need to forget about calling the Moderations API when using that model?

Thanks in advance.

1 Like

You do not get into any “trouble” with OpenAI when your app checks prompts using the moderation api endpoint. That is what the endpoint is for.

Normally, for security reasons, companies will not publish these params for many reasons, which I will not go into detail in this reply. My apologies.

I think it is recommended by OpenAI that app developers use the moderation endpoint.

That is strictly up to your use case; but since the moderation endpoint returned various moderation classes, you might consider the type of moderation violation when designing your own policy.

HTH

:slight_smile:

Thanks for your help!
Re: How do you test violating prompts without getting terminated? I was talking about the completions endpoints not moderations

Then your question does not make sense @stefr because application do not “test” the completion endpoint in the context of moderation violations.

You are using the term “testing” in the content of moderations and violations and normally a prompt is tested using the moderation endpoint before being sent to the completion endpoint. If the text is flagged during moderation, it is not sent to the completion endpoint.

Do you see what I mean?

:slight_smile:

I’m not talking about testing prompts in isolation, but testing them when calling the completions and moderations APIs in parallel. Have a look at this ChatGPT example:
I sent the word F**K. This was sent to the completion api before the moderation api.

So the question is, if my app does this, does it technically count as a violation? Or does OpenAI store moderation calls to determine if a developer account is properly handling moderations in this manner? Or do I need to send a stop request if a prompt gets flagged?

Think I answered this clearly before, but here goes again:

If you are concerned about moderation flags you should call the moderation api endpoint before you call the completion. This is not “my advice” this is what the OpenAI API docs advise.

If you call the completion APIs without moderating and your prompt / messages are flagged, OpenAI will record this flag.

How OpenAI decides to manage flags is not public information.

I think it is clear.

You should call the moderation API endpoint if you do not have especially if if you, as a developer do not have your own pre-API call filters in place.

HTH

:slight_smile:

This was sent to the conversation endpoint (not completion), before it made a subsequent call to the moderation endpoint. It appears that the actual call to generate the completion for your conversation happens server side, and if the moderation did not flag the prompt the response is dynamically rendered on the page. The difference in the time it takes on your local machine is likely because it’s round trip twice.

I think the question is very valid.

I currently don’t seem to have access to the moderation api, but thats a different issue. Even if I do, there might be something that gets flagged or causes issues. I would like to know what the response will look like in this case, so I can handle it on my server.

My optimal solution for handling content violation would be:

  • If a user has never caused a content violation, pass message to openai api.
  • if a user causes a content violation, flag the user and check future messages with the moderation api, causing slower responses

this code was suggested by chatGPT, no idea if it works, but I would love to test it
`// Handle policy violations or other errors
if (error.response && error.response.data) {
const errorData = error.response.data;
console.error(“Error:”, errorData.error.message);

// Optionally handle specific error codes
if (errorData.error.code === “content_policy_violation”) {
… //flag user
}else{
… //handle error
}
} else {
… //handle error
}`

I implemented something similar where all new users were assigned a risk_score (e.g 1=low, 2=medium, 3=high), where the initial score was 3, and over time would reach 1 (no moderation), based on their usage and how often they would get flagged. However, because there is no clarity on how many unmoderated flags you can get as a developer, at a certain scale (e.g. >100k active users) this method could result in a ban. So I just moderate everything.

1 Like

@stefr Did a lot of users get flagged?
edit: I have not been able to access the moderation so far. But it sounded like there is quite some additional delay?

Not really, but I didn’t have the scale of users I mentioned above.