Content talking about racism and identity is flagged as being racist

ribab127 · January 23, 2023, 3:32am

I tried asking ChatGPT to summarize this article, and it refused to, saying it violated the content policy.

ruby_coder · January 23, 2023, 3:59am

I reviewed that article, as a human being and not an AI, and I also found the article more than a bit racist in nature, and so it makes sense to me that OpenAI filters would have flagged the article as a violation of OpenAIs’ current policies at this time.

OpenAI is in beta, in it’s infancy so to speak, and they have tuned their filters to be overly cautious because OpenAI wishes to avoid bad publicity.

In addition, because Generative AI’s have a close to 20% hallucinate rate, it is difficult to predict exactly how ChatGPT will summarize and so OpenAI, I believe, wishes to error on the side of caution.

Hope this helps.

“hate” =>true, “hate”=>0.6628993153572083, “flagged”=>true

irb(main):020:0> Moderations.get_modinfo(a)
{"id"=>"modr-6bnr13tblahblahblahqfIWm6bKZX7t9Hy0",
 "model"=>"text-moderation-004",                                                                                          
 "results"=>                                                                                                              
  [{"categories"=>                                                                                                        
     {"hate"=>true,                                                                                                       
      "hate/threatening"=>false,                                                                                          
      "self-harm"=>false,                                                                                                 
      "sexual"=>false,                                                                                                    
      "sexual/minors"=>false,                                                                                             
      "violence"=>false,                                                                                                  
      "violence/graphic"=>false},                                                                                         
    "category_scores"=>                                                                                                   
     {"hate"=>0.6628993153572083,                                                                                         
      "hate/threatening"=>0.0006026297342032194,                                                                          
      "self-harm"=>0.00016012965352274477,                                                                                
      "sexual"=>5.3972213208908215e-05,
      "sexual/minors"=>4.74193220725283e-05,
      "violence"=>0.06760840117931366,
      "violence/graphic"=>0.0018866335740312934},
    "flagged"=>true}]}
=> 
{"id"=>"modr-6bnr13tBOQd0EqfIWm6bKZX7t9Hy0",
 "model"=>"text-moderation-004",
 "results"=>
  [{"categories"=>
     {"hate"=>true,
      "hate/threatening"=>false,
      "self-harm"=>false,
      "sexual"=>false,
      "sexual/minors"=>false,
      "violence"=>false,
      "violence/graphic"=>false},
    "category_scores"=>
     {"hate"=>0.6628993153572083,
      "hate/threatening"=>0.0006026297342032194,
      "self-harm"=>0.00016012965352274477,
      "sexual"=>5.3972213208908215e-05,
      "sexual/minors"=>4.74193220725283e-05,
      "violence"=>0.06760840117931366,
      "violence/graphic"=>0.0018866335740312934},
    "flagged"=>true}]}

Topic		Replies	Views
Flagged to be in violation of policies Feedback	4	1568	September 9, 2023
The AI That Draws What You Type Is Very Racist, Shocking No One Community	7	700	April 23, 2022
Denying of existence of people Community gpt-4 , chatgpt	11	707	July 10, 2023
Something wrong in text moderation API Bugs	5	616	December 4, 2023
Is this a problem (~racism~)? Community	10	770	April 22, 2022

Content talking about racism and identity is flagged as being racist

“hate” =>true, “hate”=>0.6628993153572083, “flagged”=>true

Related Topics