Structured Outputs - Enforce JSON value to be one of the enum values specified

desaxce · November 3, 2024, 8:31am

I made an LLM call with structured outputs asking for a JSON with one key language which can only take certain values:

LLM call parameters

"model": "gpt-4o-mini-2024-07-18",
"temperature": 0,
"response_format": {
  "type": "json_schema", 
  "json_schema": {
  "name": "classification",
  "strict": false,
  "schema": {
    "strict": true,
    "type": "object",
    "properties": {
      "language": {
        "type": "string",
        "description": "The primary language of the text content",
        "enum": [
          "english",
          "french",
          "japanese",
          "korean",
          "italian",
          "german",
          "spanish",
          "chinese",
          "polish",
          "hindi",
          "indonesian",
          "russian"
        ]
      }
    },
    "required": [
      "language"
    ],
    "additionalProperties": false
  }
}
}

I used the following system/user messages:

Messages

[
  {
    "role": "system", 
    "content": "You are a helpful assistant that analyzes text and returns structured JSON."
  },
  {
    "role": "user",
    "content": "This is the content I pass to Chainlit:'\n                Källa 1. [850.json](https://fiskarhedenvillan.com)\n\nKälla\n\n\nPrompt tokens: 1016, Completion tokens: 61, Total: 1077 (0.06 kr)\n"
  }
]

I expected the language to be one of the specified values in my enum, but I actually received swedish !!! It’s not one of the language I specified in the enum.

Same with gpt-4o-2024-08-06.

How can I enforce the output value for the language key to be specifically one of the given enum values?

I tried the syntax oneOf to no avail.

JSON schema with `oneOf` syntax

{
  "name": "classification",
  "strict": false,
  "schema": {
    "type": "object",
    "strict": true,
    "required": [
      "language"
    ],
    "properties": {
      "language": {
        "type": "string",
        "description": "The primary language of the text content",
        "enum": [
          "english",
          "french",
          "japanese",
          "korean",
          "italian",
          "german",
          "spanish",
          "chinese",
          "polish",
          "hindi",
          "indonesian",
          "russian"
        ],
        "oneOf": [
          {
            "const": "english",
            "description": "Content primarily in English language"
          },
          {
            "const": "french",
            "description": "Content primarily in French language"
          },
          {
            "const": "japanese",
            "description": "Content primarily in Japanese language"
          },
          {
            "const": "korean",
            "description": "Content primarily in Korean language"
          },
          {
            "const": "italian",
            "description": "Content primarily in Italian language"
          },
          {
            "const": "german",
            "description": "Content primarily in German language"
          },
          {
            "const": "spanish",
            "description": "Content primarily in Spanish language"
          },
          {
            "const": "chinese",
            "description": "Content primarily in Chinese language"
          },
          {
            "const": "polish",
            "description": "Content primarily in Polish language"
          },
          {
            "const": "hindi",
            "description": "Content primarily in Hindi language"
          },
          {
            "const": "indonesian",
            "description": "Content primarily in Indonesian language"
          },
          {
            "const": "russian",
            "description": "Content primarily in Russian language"
          }
        ]
      }
    },
    "additionalProperties": false
  }
}

MWE to reproduce: https://platform.openai.com/playground/chat?preset=hZDr2dMFgQpox5kEnw2BbrI2

_j · November 4, 2024, 5:39pm

It is the outermost "strict":true before the actual schema contents that enforces structured outputs.

False means the AI could write “language”: “banana” if you ask for that.

desaxce · November 4, 2024, 5:59pm

Thank you, I don’t know how I missed that one!

Guess I got confused by the several strict keys in the JSON schema.

Topic		Replies	Views
Structured outputs - enforce enum specified values API structured-output	6	110	February 20, 2025
Structured Outputs - escape characters in keys Bugs api	8	458	October 21, 2024
Strict mode does not enforce the JSON schema? API structured-output	4	244	January 27, 2025
Difference between Structured Outputs and function calling required API structured-output	9	1875	September 13, 2024
Json_schema response format handling changed Bugs json , structured-output	4	180	February 9, 2025

Structured Outputs - Enforce JSON value to be one of the enum values specified

Related topics