Is it compliant to use GPT-4o-mini for Moderation with OpenAI Policies?

Hello,

I am developing a public chat application with content moderation capabilities. Currently, I use OpenAI’s Moderation API (omni-moderation-latest) to scan suspect messages for inappropriate content. However, I’ve noticed some false positives in the results.

I’m considering using the GPT-4o-mini model alongside the Moderation API for enhanced content analysis. This dual approach would serve two key purposes:

  1. Verifying flagged content to reduce false positives by understanding contextual nuance (e.g., distinguishing between harmful content and legitimate educational discussions).
  2. Providing additional detection capabilities to identify potentially harmful content that the Moderation API might miss, especially in cases where context or subtle implications are important.

Compliance with OpenAI’s Guidelines

This implementation is designed to fully adhere to OpenAI’s safety policies and guidelines:

  • Flagged messages are analyzed solely for moderation purposes.
  • We strictly avoid storing or processing flagged content in any way that violates OpenAI’s data storage policies.
  • The results of the analysis are handled responsibly and are not exposed to end-users inappropriately.

Concerns and Questions

I aim to ensure that this approach complies with OpenAI’s safety requirements and does not inadvertently trigger automated safety systems, given its legitimate purpose of moderation.

Specifically:

  • Has the OpenAI team provided any official guidance on using language models for content moderation verification?
  • Would this specific implementation comply with OpenAI’s usage policies and terms of service?

Thank you in advance for any insights or guidance you can provide on this matter!

Welcome to the community!

//What follows is my personal opinion//

It’s a very tough subject, and I wouldn’t use OpenAI’s services on a completely public-facing system. I would consider it much too risky.

The moderation endpoint will have false positives, but also false negatives that can get your account ended. With o1, the terms become even more subjective and require significant pre-processing to ensure the inputs are compliant.

Historically, people have struggled dealing with OpenAI’s support, getting accounts reinstated. It sounds like it’s getting better, but I can’t speak to it.

It’s a different story if you have paid users, or your users are under enforceable contract (e.g. employees or enterprise customers).

TL;DR: the moderation endpoint is a must in most cases, but it won’t always save you :confused: