Moderation scores and flags

davidkaneda · October 18, 2024, 6:52pm

Hi, I run a simple AI-based text editor app and recently got a warning from OpenAI due to a moderation issue. I’m appalled at the thought of someone using my app for harmful content so I was happy/eager to implement the moderation API.

I’m confused by the flag system though, as it seems over-aggressive in moderating innocuous requests. My first implementation was to count how many user requests are flagged (flagged: true) and then automatically suspend the account after 3, to allow some room to warn the end user. This has been firing way too frequently, though, so I switched to go off the actual category scores, flagging the account whenever an individual score goes over .5. This also seems to be way too strict however, as a prompt as simple as “describe a ww2 battle” comes back as a .6 for violence.

There must be a better way to detect harmful content on our end, as I’m certain people asking about WWII wouldn’t be banned from ChatGPT. I’ve seen a few people asking similar questions in these forums with no official answer, and the docs don’t provide any guidance either.

Any suggestions? Many thanks!

Topic		Replies	Views
Moderation fail/strange API	1	726	December 18, 2023
Seeking Advice on Moderation Scores for ChatGPT API to Avoid Account Suspension API moderation , api-moderation	0	279	May 14, 2024
OpenAI Content Moderation API low scores Feedback gpt-4	1	382	February 14, 2024
Inaccuracy in Moderation API Results for Complex Queries - Seeking Help API gpt-4 , chatgpt , api , chatgpt-plugin , moderation	3	712	July 14, 2023
Clarification on Using Moderation Model to Avoid Policy Violations API gpt-4 , api	3	344	October 9, 2024

Moderation scores and flags

Related topics