Clarification on Using Moderation Model to Avoid Policy Violations

thiag0machad · October 9, 2024, 4:27pm

Hi everyone,

I recently received a warning from OpenAI regarding potential violations of their usage policies, specifically related to political campaigning and lobbying. To address this, I’ve implemented the OpenAI moderation model to help detect and prevent content that may violate these policies.

However, I’m still unsure if using the moderation model alone is enough to ensure compliance and avoid further issues. I would greatly appreciate any insights or advice from the community on:

1.	Whether the moderation model is sufficient to prevent violations, or if additional safeguards are necessary.
2.	Any recommended best practices or additional steps I should consider to further reduce the risk of being flagged.
3.	How to know when OpenAI considers the issue fully resolved after corrective measures are implemented.

Thanks in advance for your help!

_j · October 9, 2024, 4:54pm

Moderations only scores content in specific areas, ultimately raising a flag you can also use.

It doesn’t check for “international disinformation campaigns” or other undesirable use, that OpenAI is proud to announce their account bannings of, despite no clear terms-of-use violation.

Does your use make the world a worse place? Are you attempting to profit off the time or resources of annoying others? Then you probably shouldn’t do it.

thiag0machad · October 9, 2024, 4:59pm

Our platform uses the OpenAI API with thousands of users, and we want to ensure compliance with OpenAI’s policies. We understand that the current moderation model cannot handle broader issues like large-scale disinformation or misuse.

Given this, we need another solution to monitor and control the prompts more effectively. Could you recommend any additional tools or strategies we could use to prevent inappropriate use, especially at scale?

_j · October 9, 2024, 5:59pm

If you can “fingerprint” lots of undesirable chats and sessions, you could perform embeddings on those, compared to a larger baseline of acceptable chats. Score how undesirable the bad ones are to increase the penalty of your algorithm.

Then you can start to identify the problematic users by aggregate score, average for account, recent detection of campaigns being run, etc. Then send out your own “you are the worst actor on our service” warnings after verifying the truthfulness of your detection.

You can send the “user” field in your requests to OpenAI. This has never come into play to “save” anyone from their bad users as far as I know, barely getting a mention in docs for the past two years, but it shows that you are doing your own tracking and vigilance, and exposing particular IDs to OpenAI also if they plan to help you instead of harm you.

Topic		Replies	Views
Need Help: Facing OpenAI Usage Violation Due to user's Abuse API moderation , best-practices , gpt-4o-mini	11	1165	November 17, 2024
Usage policy violations with fine-tuned model - how can I avoid this? API question , gpt-35-turbo , fine-tuning , fine-tuning-problems	2	1278	October 20, 2023
Dealing with Moderation API False Positives API moderation , api-moderation , gpt-4o-mini	3	103	July 8, 2025
Despite Double Moderation, My OpenAI Account Got Suspended – Any Advice? Community account-problem , api , api-moderation	7	346	August 19, 2025
GPT-3 API concerned users may get me banned Community	3	2734	December 20, 2023

Clarification on Using Moderation Model to Avoid Policy Violations

Related topics