Moderations.create - how to save and parse output?

_j · November 28, 2023, 6:06am

Let’s do moderations!

First, we’re going to need the prerequisites - python 3.8-3.11. Then you’ll need to pip install --upgrade openai to get the latest version of the python library with its new client object.

OpenAI’s example

from openai import OpenAI
client = OpenAI()
client.moderations.create(input="I want to kill them.")

Lame. Doesn’t even let you get the results.

Useful example

from openai import OpenAI
client = OpenAI()
text = "I like kittens."
api_response = client.moderations.create(input=text)
response_dict = api_response.model_dump()
is_flagged = response_dict['results'][0]['flagged']

Now you have a dictionary object. I also give a boolean you can check to see if the input got flagged.

In similar fashion, you can find the “true” categories that resulted in the flag.

See pretty results

You might actually not want a garbled line going off the screen but instead a nicely formatted output, with the categories alphabetized, and the number values not shown in an exponential form. Let’s add some more utility for an interactive script.

def process(data):
    if isinstance(data, dict):
        sorted_data = {k: process(v) for k, v in sorted(data.items())}
        return {k: format_floats(v) for k, v in sorted_data.items()}
    elif isinstance(data, list):
        return [process(item) for item in data]
    else:
        return data

def format_floats(data):
    if isinstance(data, float):
        # Format floats to 10 decimal places as strings
        return f"{data:.10f}"
    else:
        return data

text = "I drown kittens."
api_response = client.moderations.create(input=text)
response_dict = api_response.model_dump()

formatted_dict = process(response_dict)
print(json.dumps(formatted_dict, indent=2))

We get output meant for humans:

{
  "id": "modr-8PkrTu6sR6pT1ztdSRAwVslnt6OtS",
  "model": "text-moderation-006",
  "results": [
    {
      "categories": {
        "harassment": false,
        "harassment/threatening": false,
        "harassment_threatening": false,
        "hate": false,
        "hate/threatening": false,
        "hate_threatening": false,
        "self-harm": false,
        "self-harm/instructions": false,
        "self-harm/intent": false,
        "self_harm": false,
        "self_harm_instructions": false,
        "self_harm_intent": false,
        "sexual": false,
        "sexual/minors": false,
        "sexual_minors": false,
        "violence": false,
        "violence/graphic": false,
        "violence_graphic": false
      },
      "category_scores": {
        "harassment": "0.0026197021",
        "harassment/threatening": "0.0043704621",
        "harassment_threatening": "0.0043704621",
        "hate": "0.0000743081",
        "hate/threatening": "0.0000794773",
        "hate_threatening": "0.0000794773",
        "self-harm": "0.0000493223",
        "self-harm/instructions": "0.0000000002",
        "self-harm/intent": "0.0000661878",
        "self_harm": "0.0000493223",
        "self_harm_instructions": "0.0000000002",
        "self_harm_intent": "0.0000661878",
        "sexual": "0.0000032877",
        "sexual/minors": "0.0000095750",
        "sexual_minors": "0.0000095750",
        "violence": "0.6199731827",
        "violence/graphic": "0.0040242169",
        "violence_graphic": "0.0040242169"
      },
      "flagged": false
    }
  ]
}

Pick one of the duplicated items

Sorting shows us the new moderation Pydantic model object has an issue, seen in all methods. Outputs with a slash are duplicated with an underscore. Same for self-harm with a hyphen.

This could be anticipating the need for python reference and the slash breaking some parsing, but it is also silly, so we pick one and kill the other. Let’s use the logic that the underscore version looks better and is more reliable.

from openai import OpenAI
import json

client = OpenAI()

def process(data):
    if isinstance(data, dict):
        sorted_data = {
            k: process(v)
            for k, v in sorted(data.items())
            if '/' not in k and '-' not in k  # Filter out key-value pairs with '/' and '-'
        }
        return {k: format_floats(v) for k, v in sorted_data.items()}
    elif isinstance(data, list):
        return [process(item) for item in data]
    else:
        return data

def format_floats(data):
    if isinstance(data, float):
        # Format floats to 7 decimal places as strings
        return f"{data:.7f}"
    else:
        return data

text = "I drown kittens."
api_response = client.moderations.create(input=text)
response_dict = api_response.model_dump()

formatted_dict = process(response_dict)
print(json.dumps(formatted_dict, indent=2))

Now: Better display we can understand and take action on:

…
“category_scores”: {
“harassment”: “0.0026898”,
“harassment_threatening”: “0.0043335”,
“hate”: “0.0000773”,
“hate_threatening”: “0.0000787”,
“self_harm”: “0.0000494”,
“self_harm_instructions”: “0.0000000”,
“self_harm_intent”: “0.0000658”,
“sexual”: “0.0000034”,
“sexual_minors”: “0.0000100”,
“violence”: “0.6200027”,
“violence_graphic”: “0.0038763”
},

You might want 10 decimal places to see really low values.

Further moderations

Drowning kittens is not flagged (flagging being OpenAI policy-violating), but is an example more violent than we might want for kids to say or receive.

In that case, you can write your own thresholds for flagging. That will take a lot of experimentation, because OpenAI doesn’t say the current values of each that equates to a flag, so you have find that baseline yourself and determine how much you might adjust each parameter.

Topic		Replies	Views
API endpoint to replicate examples on the "GPT-4 for content moderation" blogpost API moderation	3	955	December 16, 2023
Moderation scores and flags Feedback moderation	0	153	October 18, 2024
Bug: Moderation-API returns that really bad input is ok API	6	938	December 18, 2023
Possible to provide context to moderation input? API python , moderation	3	446	February 4, 2024
API Moderation inconsistent with chat completion acceptance API	5	1078	January 21, 2024