I’m relatively new to using the API (ChatGPT), but I’ve noticed an issue with less commonly used languages. The same text in English and Finnish yields different results with the moderation app. Also, we can’t utilize the category scores.
Tapan sinut kirveellä ModerationResponse{id=‘modr-8783qTrGJAp2t3kNLAmtY8wB8uKsu’, model=‘text-moderation-006’, results=[Result{flagged=false, categories={sexual=false, hate=false, harassment=false, self-harm=false, sexual/minors=false, hate/threatening=false, violence/graphic=false, self-harm/intent=false, self-harm/instructions=false, harassment/threatening=false, violence=false}, category_scores={sexual=0.0010161410318687558, hate=2.1737563656643033E-4, harassment=9.192628785967827E-4, self-harm=1.8292998720426112E-4, sexual/minors=3.535413707140833E-4, hate/threatening=2.5697704404592514E-4, violence/graphic=2.761634641501587E-5, self-harm/intent=4.713524322141893E-5, self-harm/instructions=5.2454819524427876E-5, harassment/threatening=1.919300848385319E-4, violence=0.007186449598520994}}]}
I kill your with axe ModerationResponse{id=‘modr-8783s4bZwGLAJhJcq2KwIVKNqopMK’, model=‘text-moderation-006’, results=[Result{flagged=true, categories={sexual=false, hate=false, harassment=false, self-harm=false, sexual/minors=false, hate/threatening=false, violence/graphic=false, self-harm/intent=false, self-harm/instructions=false, harassment/threatening=true, violence=true}, category_scores={sexual=4.538599168881774E-4, hate=4.3008412467315793E-4, harassment=0.4044300615787506, self-harm=5.808872447232716E-5, sexual/minors=3.867456825901172E-7, hate/threatening=4.94822976179421E-4, violence/graphic=7.7569589484483E-4, self-harm/intent=2.4974851839942858E-5, self-harm/instructions=6.064902322577836E-7, harassment/threatening=0.47488856315612793, violence=0.9963110089302063}}]}
It would be challenging to implement this directly in, for example, a customer chat or a similar application. And if one were to first translate everything with ChatGPT via an API call, would that application violate any rules? And how else is it categorized, especially if the context is about literature?
I have indeed read that in the moderation section of the instructions, there’s a warning that everything doesn’t work as intended for less commonly used languages
I guess I’m still a bit out of the loop on these matters.