I’m implementing moderation for my AI app and I’m a bit unsure about the correct architecture.
Right now I’m using the Moderation API and it flags a lot of content that isn’t necessarily harmful in context. If I treat all flagged content as a hard block, it ends up overblocking legitimate use cases (e.g. users asking for help or advice in sensitive situations).
What’s the recommended best practice here?
Specifically:
- How do you distinguish between content that should be blocked vs. content that is sensitive but still legitimate depending on context?
- What’s the correct way to handle flagged content without degrading the user experience?
Would really appreciate a concrete example or reference architecture from anyone who has implemented this properly.