GPT4.1 doesn't follow strict json schema

baki.kucukcakiroglu · June 23, 2025, 8:14pm

Sometimes (maybe less than once in every thousand request) model doesn’t follow strick structured output schema. See the following example, even if there is a json_schema, the output is a string without defined fields.

This causes big problems in production. Also causes ModelBehaviorError: Invalid output type in OpenAI Agents TS SDK.

Structured Output Schema from Traces Dashboard
{
  "name": "output",
  "strict": true,
  "description": null,
  "schema": {
    "type": "object",
    "properties": {
      "content": {
        "type": "string"
      },
      "suggestive_answers": {
        "type": "array",
        "items": {
          "type": "string"
        }
      },
      "recommended_products": {
        "type": "array",
        "items": {
          "type": "string"
        }
      },
      "forms": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "type": {
              "type": "string"
            },
            "fields": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "name": {
                    "type": "string"
                  },
                  "type": {
                    "type": "string"
                  }
                },
                "required": [
                  "name",
                  "type"
                ],
                "additionalProperties": false
              }
            }
          },
          "required": [
            "type",
            "fields"
          ],
          "additionalProperties": false
        }
      }
    },
    "required": [
      "content",
      "suggestive_answers",
      "recommended_products",
      "forms"
    ],
    "additionalProperties": false
  }
}

jai · June 23, 2025, 8:27pm

I’m not sure whether the temperature value would improve this (especially since you reported a less than 1 in 1000 error rate), but what is the value you’re using at the moment? Also, was there anything peculiar about the input prompts in the cases where it does deviate?

baki.kucukcakiroglu · June 23, 2025, 8:43pm

A model with structured output must always provide an output following the schema. Temperature or prompt may only change the content of the fields.

In my particular case , there is nothing against json schema in the prompt. Also temperature is the default temperature.

DevAccount · June 24, 2025, 12:54am

If it’s only less than once in a thousand calls, just retry it if it throws an exception.

jlvanhulst · June 24, 2025, 1:03am

Well that is sometimes tricky - in my case (also with 4.1) it is about 1 in 10-20 AND on retry it happens again. Structured output with responses API returns tons of \n\n\n\n - #2 by OnceAndTwice oh and it takes a minute or so to come the (broken) result.

_j · June 24, 2025, 3:43am

Your other topic’s result is gpt-4.1 failing to emit the correct stops sequence.

Also that the Responses endpoint blocks you from solving this. When a failure of a model is combined with a failure of an endpoint (yes, I meant that in the non-transitory way), the result is that there is no alternate stop sequence parameter (such as linefeed) to terminate the output, and no working bias parameter to promote the correct stop token. No frequency penalty to demote 2000 tokens of a loop. Only a max_output_tokens that could set your maximum budget of nonstop output.

However here, the context free grammar for a response format sent with a request completely failed to work at all. It is likely there was not even a schema placement in the context for the AI to follow. The AI just responded to the user.

Diagnosis:

One can carefully reconstruct the same call, and look at the input token expense vs that API call previously billed - you will be charged more when your schema language is actually placed for the AI vs when it is not or when you simply don’t send the text format for json_schema as API parameter.

Repair:

Just one more thing for OpenAI to fix to make failure impossible, and to then implement configuration management and beta cycles to stop delivering immediate overnight breakage…a minimum viable product is not a viable product.

baki.kucukcakiroglu · July 3, 2025, 10:15am

@OpenAI_Support please check the issue and the post in the link. This is an important problem.

Topic		Replies	Views
ModelBehaviorError: Invalid output type API api , agents-sdk	9	211	July 16, 2025
Invalid JSON response when using Structured Output Bugs json , json-mode	4	548	February 16, 2025
Fine-tune 4o model - endless inference for JSON Bugs	6	140	April 18, 2025
Clarity on gpt-4.1 and o4-mini structured output support API gpt-41 , o4-mini	8	4091	May 20, 2025
Structured Output via JSON schema sometimes outputs multiple responses Bugs structured-output	1	53	July 3, 2025

GPT4.1 doesn't follow strict json schema

Related topics