Omni Moderation often fails to flag harmful texts

I’m currently using the latest version of OpenAI’s Omni Moderation to detect disallowed or harmful content such as profanity, insults, hate speech, and other offensive language. However, I’ve noticed that in many cases, the model fails to flag content that clearly violates moderation guidelines.

I’d like to understand whether this is:

  • A known limitation in the current Omni Moderation model,

  • Something that could be improved through configuration or preprocessing (e.g., custom filtering or threshold tuning).