API Policy Violation Warning - Advice on how to best resolve?

I have an app with thousands of public users & it uses the openai api

I received this API warning:

I do some content moderation but do not run every input through the moderation api (added latency)

I plan to implement the moderation api & ban any repeated naughty users.

Has anyone had success communicating with the trust & safety team?

It may take more than 14 days for me to identify, fix & deploy the update.

I really don’t want to lose all the fine tuned models I worked hard to make

2 Likes

If you don’t do it already, you can also set the user parameter in API requests. This will help OpenAI identify illegal requests, maybe they will be less likely to block your entire account in this case.

5 Likes

Good idea I’ll do this

Anyone know:

Will openai look at the % of request violating this category over the next 14 days to determine whether to suspend or terminate my account?

And I just need to ensure this drops (using moderation endpoint etc.)?

The good news is that you were running your operation without immediately getting flagged. Apparently your user base is decent enough for you not to panic.

But this is a serious situation.
The advice from @LinqLover is really good. Your aim is to be transparent and you need to demonstrate the willingness to fix the problem.

In case you didn’t do so already double check the safety best practices from the docs.

Until then I would propose a gradual rollout in the sense that at first you simply reject all requests that are getting flagged by the moderation endpoint while working on a more elegant solution that matches your case.

Good luck!

6 Likes

I’d add that the " How to use the moderation API" in the OAI Cookbook is helpful. There used to be a best practices page, but I can’t find it at the moment. Tagging user with your internal user numbers will help you flag and remove the people misusing your system. I’m sure it goes a long way in letting OAI know you’re serious about safety.

Ah, here we go… OpenAI Safety best practices

## End-user IDs

Sending end-user IDs in your requests can be a useful tool to help OpenAI monitor and detect abuse. This allows OpenAI to provide your team with more actionable feedback in the event that we detect any policy violations in your application.

The IDs should be a string that uniquely identifies each user. We recommend hashing their username or email address, in order to avoid sending us any identifying information. If you offer a preview of your product to non-logged in users, you can send a session ID instead.

I would try to make sure you’re following all best practices or going to implement them… might ask for more time if needed, but let them know you’ve seen the error of your ways and are improving.

6 Likes

Thank you @vb

Your aim is to be transparent and you need to demonstrate the willingness to fix the problem

Do you recommend expressing my plan to fix this to trustandsafety@openai or support@openai?

One issue: I’ve seen false positives with the moderation endpoint. Blocking all flagged inputs would kill my apps.

One example that anyone can try:

Ask the same question, first about Trump; then about Biden. Pass it to the moderation endpoint. Notice moderation flag scores can be 10x higher for harassment etc. when Trump is used with all else constant.

1 Like

I would email trustandsafety@openai as the email suggests. Let them know what you’ve immediately implemented to solve problem and your action plan for moving forward - ie following best practices for safety as they recommend…

What kind of app do you have?

Just a chatbot? Open text input?

2 Likes
  1. Send it to both and use the ‘to whom it may concern’. If it’s not relevant to the support team they will either drop it or forward it.

  2. Good point. I can’t (and won’t) do a risk assessment without any further insights. But you can set your own thresholds as to what you deem to be acceptable for the different dimensions. By running a bunch of recent requests against the moderation endpoint you can try to infer what those thresholds should be.
    It’s also an option that you identify that this flag from the safety team against your account is in fact an error.

3 Likes

Find a bad words list, and check with php first… No extra latency.

2 Likes

I want to avoid the 0.5-1s latency of doing a moderation call before the llm call

Will openai trustandsafety accept parallel calls?

(then don’t send the flagged result back to user / cancel the stream)

I’m concerned I have to do sequential calls or they’ll ban my account… But my users will hate the added latency.

2 Likes

If you can leverage parallelization as suggested in the docs then it’s a suggested best practice to reduce latency. In addition with user_ids and some additional measurements as outlined in the safety best practices you should be able to find a viable solution. I hope.

Either way, Developers are not supposed to use the models in a way that results in a breach of the business terms and subsequently the other policies. That’s part of the product design.

2 Likes

If you wish to reduce the number of moderations calls, you can set up a trust system, where new users have all inputs sent, and then as they qualify themselves as not producing flags, you can reduce the percentage of their calls that are pre-moderated or moderated at all.

You can also just pass the new message inputs with little context.

It is almost mandatory to come up with some system of just sampling, because tier-5 gives 150,000TPM moderation vs 10,000,000TPM GPT-4o.

4 Likes

Great ideas I appreciate those & am trying to implement some of these asap.

I’ve already implemented some minor moderation changes

(more to come after I test extensively & solve for latency issue - sometimes the moderation call takes >1 second!)

2 questions:

  1. How will OpenAI determine if I’m no longer considered violating?
  • e.g. if the % of my requests scoring high in this category is now back below threshold ? or if they see I’m using the moderation endpoint?
  1. How will I know if OpenAI determines in the non-violation zone in the next 2 weeks?
  • e.g. would i get an email saying I’m back on track? or would I simply not get another (automated?) warning email?

The message about 14 days is rather ambiguous. Is it an evaluation period where you are finally judged, or a grace period before you shall clean up your act.

The policies are not public or published, probably because with algorithmic knowledge you could run it up to the limit of undesired submissions and generations.

Thanks

It’s now been >7 days since the 1st warning email with the two week notice.

I’ve made some changes & I’ve gotten no follow up or ‘one week’ notice.

Does this mean my account is no longer in the warning/remediation category?

Would just hate to get a sudden email saying “terminated” on day 14 with no second warning / heads up

1 Like

Have you contacted them as suggested? I would be proactive moreso than reactive at this point. Good luck!

1 Like

A post was split to a new topic: How to set the user parameter usingAssistants API

I already set the user via the “user” parameter and yet the ID they included in the email doesn’t match any of my users nor is it in the expected format. If the ID referenced in the email isn’t the same as the one we sent in “user” then how are we supposed to know who the offending user is?

I’m already sending the user field, but the ID they included in the email is not in the same format as the user ID I’m sending them.

Has someone hacked your endpoint maybe? That is, sending out-of-game requests to it with a fictitious user id number? That was my first thought. I’d let them know that and try to find out how the other requests are hitting your endpoint…