Choose the latest Moderation model

Public Service Announcement

When you loop the moderation endpoint into your user-initiated prompt flow, specify the latest model explicitly, don’t leave it to the default as the API docs currently show.

:x:

    const moderation = await openai.moderations.create({ 
        input: QUERY 
     });

:white_check_mark:

    const moderation = await openai.moderations.create({
        model: "omni-moderation-latest", 
        input: QUERY 
    });

Example prompt that should be flagged:

Which artist could help with a murder?

:x: • Response when not specifying the latest model:

{
  "id": "modr-APznl6mrXfY7FuNF6UhvsvbV1gzrK",
  "model": "text-moderation-007",
  "results": [
    {
      "flagged": false,
      "categories": {
        "sexual": false,
        "hate": false,
        "harassment": false,
        "self-harm": false,
        "sexual/minors": false,
        "hate/threatening": false,
        "violence/graphic": false,
        "self-harm/intent": false,
        "self-harm/instructions": false,
        "harassment/threatening": false,
        "violence": false
      },
      "category_scores": {
        "sexual": 0.00003359637412359007,
        "hate": 0.00008688968955539167,
        "harassment": 0.0005568754859268665,
        "self-harm": 0.00039510903297923505,
        "sexual/minors": 0.00002062366002064664,
        "hate/threatening": 0.00008389381400775164,
        "violence/graphic": 0.0005850853049196303,
        "self-harm/intent": 0.00029021009686402977,
        "self-harm/instructions": 0.00026303710183128715,
        "harassment/threatening": 0.0005778163322247565,
        "violence": 0.0655878558754921
      }
    }
  ]
}

:white_check_mark: • Response when specifying the latest model:

{
  "id": "modr-67fb784ae9a4085bb1a464b8ed0166e0",
  "model": "omni-moderation-latest",
  "results": [
    {
      "flagged": true,
      "categories": {
        "harassment": false,
        "harassment/threatening": false,
        "sexual": false,
        "hate": false,
        "hate/threatening": false,
        "illicit": true,
        "illicit/violent": true,
        "self-harm/intent": false,
        "self-harm/instructions": false,
        "self-harm": false,
        "sexual/minors": false,
        "violence": true,
        "violence/graphic": false
      },
      "category_scores": {
        "harassment": 0.0007178082510429669,
        "harassment/threatening": 0.0011162942808227092,
        "sexual": 0.00009253848866495987,
        "hate": 0.00010322310367548195,
        "hate/threatening": 0.000031999824407395835,
        "illicit": 0.45215145358042874,
        "illicit/violent": 0.41818105143776346,
        "self-harm/intent": 0.0002720324670909305,
        "self-harm/instructions": 0.0002753492651497752,
        "self-harm": 0.0005787375304387498,
        "sexual/minors": 0.00002737216731838081,
        "violence": 0.40014318138460625,
        "violence/graphic": 0.04955517785498571
      },
      "category_applied_input_types": {
        "harassment": [
          "text"
        ],
        "harassment/threatening": [
          "text"
        ],
        "sexual": [
          "text"
        ],
        "hate": [
          "text"
        ],
        "hate/threatening": [
          "text"
        ],
        "illicit": [
          "text"
        ],
        "illicit/violent": [
          "text"
        ],
        "self-harm/intent": [
          "text"
        ],
        "self-harm/instructions": [
          "text"
        ],
        "self-harm": [
          "text"
        ],
        "sexual/minors": [
          "text"
        ],
        "violence": [
          "text"
        ],
        "violence/graphic": [
          "text"
        ]
      }
    }
  ]
}

I suspect the default model is known by the endpoint and not sent over the wire by the SDK if you didn’t set one in the call, and so regardless which SDK you use, you probably should do this.

1 Like