Sexual related text not flagged by text-moderation-latest

angelicoalejandro · July 11, 2025, 2:34am

I’m testing text-moderation-latest api to detect harmful content. If I add an insult in the text, the text is flagged, but if I add a link to a porn site or if i write “porn” or similar words, it returns false.

Is there some tweak to make it more strict?

_j · July 11, 2025, 10:47am

You can use the numeric values that are returned instead of merely whether flagged or not. Since the flagging is at an undocumented threshold value, you’ll have to test and infer what value that you conditionally evaluate yourself is actually more sensitive than the flag.

You can also see if the omni moderation model gives the flagging you want. There’s another thread here with it going off with unwanted “sexual” flag, so it may be what you desire.

angelicoalejandro · July 11, 2025, 2:35pm

than you, I asked ChatGPT (should have done this before posting) and now I realize the moderation searches for “harmful content” but just talking about sex or porn is not always harmful content. It suggested to add some regular filters for specific words for this case.

Topic		Replies	Views
Something wrong in text moderation API Bugs	5	1156	December 4, 2023
Moderation scores and flags Feedback moderation	0	486	October 18, 2024
Omni Moderation often fails to flag harmful texts Bugs chatgpt , moderation	0	93	November 6, 2025
Moderation Scores and Logical Thresholds API api	0	149	February 26, 2025
Moderation fail/strange API	0	856	December 14, 2022

Sexual related text not flagged by text-moderation-latest

Related topics