If you don’t do it already, you can also set the user parameter in API requests. This will help OpenAI identify illegal requests, maybe they will be less likely to block your entire account in this case.
The good news is that you were running your operation without immediately getting flagged. Apparently your user base is decent enough for you not to panic.
But this is a serious situation.
The advice from @LinqLover is really good. Your aim is to be transparent and you need to demonstrate the willingness to fix the problem.
Until then I would propose a gradual rollout in the sense that at first you simply reject all requests that are getting flagged by the moderation endpoint while working on a more elegant solution that matches your case.
I’d add that the " How to use the moderation API" in the OAI Cookbook is helpful. There used to be a best practices page, but I can’t find it at the moment. Tagging user with your internal user numbers will help you flag and remove the people misusing your system. I’m sure it goes a long way in letting OAI know you’re serious about safety.
Sending end-user IDs in your requests can be a useful tool to help OpenAI monitor and detect abuse. This allows OpenAI to provide your team with more actionable feedback in the event that we detect any policy violations in your application.
The IDs should be a string that uniquely identifies each user. We recommend hashing their username or email address, in order to avoid sending us any identifying information. If you offer a preview of your product to non-logged in users, you can send a session ID instead.
I would try to make sure you’re following all best practices or going to implement them… might ask for more time if needed, but let them know you’ve seen the error of your ways and are improving.
Your aim is to be transparent and you need to demonstrate the willingness to fix the problem
Do you recommend expressing my plan to fix this to trustandsafety@openai or support@openai?
One issue: I’ve seen false positives with the moderation endpoint. Blocking all flagged inputs would kill my apps.
One example that anyone can try:
Ask the same question, first about Trump; then about Biden. Pass it to the moderation endpoint. Notice moderation flag scores can be 10x higher for harassment etc. when Trump is used with all else constant.
I would email trustandsafety@openai as the email suggests. Let them know what you’ve immediately implemented to solve problem and your action plan for moving forward - ie following best practices for safety as they recommend…
Send it to both and use the ‘to whom it may concern’. If it’s not relevant to the support team they will either drop it or forward it.
Good point. I can’t (and won’t) do a risk assessment without any further insights. But you can set your own thresholds as to what you deem to be acceptable for the different dimensions. By running a bunch of recent requests against the moderation endpoint you can try to infer what those thresholds should be.
It’s also an option that you identify that this flag from the safety team against your account is in fact an error.
If you can leverage parallelization as suggested in the docs then it’s a suggested best practice to reduce latency. In addition with user_ids and some additional measurements as outlined in the safety best practices you should be able to find a viable solution. I hope.
Either way, Developers are not supposed to use the models in a way that results in a breach of the business terms and subsequently the other policies. That’s part of the product design.
If you wish to reduce the number of moderations calls, you can set up a trust system, where new users have all inputs sent, and then as they qualify themselves as not producing flags, you can reduce the percentage of their calls that are pre-moderated or moderated at all.
You can also just pass the new message inputs with little context.
It is almost mandatory to come up with some system of just sampling, because tier-5 gives 150,000TPM moderation vs 10,000,000TPM GPT-4o.
The message about 14 days is rather ambiguous. Is it an evaluation period where you are finally judged, or a grace period before you shall clean up your act.
The policies are not public or published, probably because with algorithmic knowledge you could run it up to the limit of undesired submissions and generations.
I already set the user via the “user” parameter and yet the ID they included in the email doesn’t match any of my users nor is it in the expected format. If the ID referenced in the email isn’t the same as the one we sent in “user” then how are we supposed to know who the offending user is?
Has someone hacked your endpoint maybe? That is, sending out-of-game requests to it with a fictitious user id number? That was my first thought. I’d let them know that and try to find out how the other requests are hitting your endpoint…