Structured outputs - enforce enum specified values

nlar · February 19, 2025, 2:23pm

Hello,

I am using an API request like the one below to categories sections of a document into predefined categories.

However, the model keeps adding categories that are not specified in the enum.

Is there anything I need to change in the request for the model to only use the enums? I have been adding and removing the strict key in different places without any luck.

I hope you help.

{
  "model": "gpt-4o-mini-2024-07-18",
  "messages": [
    {
      "role": "system",
      "content": "Prompt Here"
    },
    {
      "role": "user",
      "content": "Document Here"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "categorize_document",
        "response_format": "json",
        "description": "Categorize sections of a document into predefined categories. Do not create new categories under any circumstances",
        "strict": true,
        "parameters": {
          "type": "object",
          "strict": true,
          "properties": {
            "text": {
              "type": "string",
              "description": "The exact text found in the document."
            },
            "category": {
              "type": [
                "string",
                "null"
              ],
              "enum": [
                "World News",
                "Politics",
                "Business",
                "Technology",
                "Health",
                "Science",
                "Entertainment",
                "Sports",
                "Environment",
                "Local News",
                "Breaking News",
                "Opinion",
                "Editorial",
                "Investigative Reporting",
                "Weather",
                "Travel",
                "Lifestyle",
                "Education",
                "Culture"
              ],
              "description": "The predefined categories used for categorization. Do not create new categories under any circumstances."
            },
            "reasoning": {
              "type": "string",
              "description": "The reasoning behind the categorization"
            },
            "pageNumber": {
              "type": "string",
              "description": "The page number the reference is found on"
            }
          },
          "required": [
            "text",
            "category",
            "reasoning",
            "pageNumber"
          ],
          "additionalProperties": false
        }
      }
    }
  ]
}

_j · February 19, 2025, 4:57pm

It seems you are wanting to always get a JSON response.

The output of the AI is not to operate a function and have the function return a value that is useful or informative. the output of the AI is the final format.

In that case, you’d use the parameter response_format, where you provide a JSON schema for JSON mandatory to be produced. When you make that “strict”, the AI should have no alternative but to use your limited enum list.

for response_format of type:json_schema, here’s the json_schema value you could provide:

{
  "name": "mandatory_json_output",
  "schema": {
    "type": "object",
    "properties": {
      "parameters": {
        "type": "object",
        "properties": {
          "text": {
            "type": "string",
            "description": "The exact text found in the document."
          },
          "category": {
            "type": "string",
            "description": "The predefined categories used for categorization. Do not create new categories under any circumstances.",
            "enum": [
              "World News",
              "Politics",
              "Business",
              "Technology",
              "Health",
              "Science",
              "Entertainment",
              "Sports",
              "Environment",
              "Local News",
              "Breaking News",
              "Opinion",
              "Editorial",
              "Investigative Reporting",
              "Weather",
              "Travel",
              "Lifestyle",
              "Education",
              "Culture"
            ]
          },
          "reasoning": {
            "type": "string",
            "description": "The reasoning behind the categorization"
          },
          "pageNumber": {
            "type": "string",
            "description": "The page number the reference is found on"
          }
        },
        "required": [
          "text",
          "category",
          "reasoning",
          "pageNumber"
        ],
        "additionalProperties": false
      }
    },
    "required": [
      "parameters"
    ],
    "additionalProperties": false
  },
  "strict": true
}

sergeliatko · February 19, 2025, 8:42pm

I kind of remember there was an old tutorial about fancy use of Jason output format for classification. But I do not think it will work in your case because your function name is classify… I think the previous response makes more sense in your use case. But if I’m right about the tutorial you are basing your request on comma in this case I would try to rename your function to something like validate potential classification. So that model output is taken for a potential classification to validate by the external function which would basically Force the model to try to classify the existing text.

But honestly, if it’s for production, I would start by implementing this suggestions from the post above this one.

nlar · February 19, 2025, 9:52pm

Thanks for the input both of you - it’s much appreciated.

@_j , I am testing through Postman.

Can I use that JSON schema you have provided as is in the body or do I need to change the format for that?

{
  "model": "gpt-4o-mini-2024-07-18",
  "messages": [
    {
      "role": "system",
      "content": "Prompt Here"
    },
    {
      "role": "user",
      "content": "Document Here"
    }
+ your JSON Schema

scott4 · February 20, 2025, 1:45am

We’ve been doing something similar with Pydantic schemas.

_j · February 20, 2025, 3:48am

The response schema that I show there is already placed in the metadata container for OpenAI that, outside of the specification of the response schema, includes the name of the JSON response schema and whether to enforce strict structured output on the AI:

{
  "name": "output_json",
  "schema": {goes_here},
  "strict": true
}

To send this schema in your API call, you need to use a specific parameter response_format, just like how you send the parameter for “messages” or “model”:

{
    "model": "gpt-4o",
    "response_format": {
        "type": "json_schema",
        "json_schema": {object_goes_here}
    },
    ...
}

The entire API call needs to be a fully formed and validating JSON. You can’t just throw random text inside.

Then, the AI will not be able to respond in anything BUT the JSON format specified.

nlar · February 20, 2025, 10:31am

@_j

It is working as expected now thanks to your input. Thank you for taking the time to help - it is much appreciated.

Topic		Replies	Views
Structured Outputs - Enforce JSON value to be one of the enum values specified Bugs structured-output	2	1427	November 4, 2024
Difference between Structured Outputs and function calling required API structured-output	9	4481	September 13, 2024
API response is not JSON parsable despite specified response format API api , response_format , gpt-4o-mini , structured-output	13	3701	November 21, 2024
Structured output is not structured API api , structured-output	6	804	February 19, 2025
Json_schema response format handling changed Bugs json , structured-output	4	1814	February 9, 2025

Structured outputs - enforce enum specified values

Related topics