I recently received a warning from OpenAI regarding potential violations of their usage policies, specifically related to political campaigning and lobbying. To address this, I’ve implemented the OpenAI moderation model to help detect and prevent content that may violate these policies.
However, I’m still unsure if using the moderation model alone is enough to ensure compliance and avoid further issues. I would greatly appreciate any insights or advice from the community on:
1. Whether the moderation model is sufficient to prevent violations, or if additional safeguards are necessary.
2. Any recommended best practices or additional steps I should consider to further reduce the risk of being flagged.
3. How to know when OpenAI considers the issue fully resolved after corrective measures are implemented.
Moderations only scores content in specific areas, ultimately raising a flag you can also use.
It doesn’t check for “international disinformation campaigns” or other undesirable use, that OpenAI is proud to announce their account bannings of, despite no clear terms-of-use violation.
Does your use make the world a worse place? Are you attempting to profit off the time or resources of annoying others? Then you probably shouldn’t do it.
Our platform uses the OpenAI API with thousands of users, and we want to ensure compliance with OpenAI’s policies. We understand that the current moderation model cannot handle broader issues like large-scale disinformation or misuse.
Given this, we need another solution to monitor and control the prompts more effectively. Could you recommend any additional tools or strategies we could use to prevent inappropriate use, especially at scale?
If you can “fingerprint” lots of undesirable chats and sessions, you could perform embeddings on those, compared to a larger baseline of acceptable chats. Score how undesirable the bad ones are to increase the penalty of your algorithm.
Then you can start to identify the problematic users by aggregate score, average for account, recent detection of campaigns being run, etc. Then send out your own “you are the worst actor on our service” warnings after verifying the truthfulness of your detection.
You can send the “user” field in your requests to OpenAI. This has never come into play to “save” anyone from their bad users as far as I know, barely getting a mention in docs for the past two years, but it shows that you are doing your own tracking and vigilance, and exposing particular IDs to OpenAI also if they plan to help you instead of harm you.