When OpenAI announced Structured Output in August of last year, we understood that strict mode would guarantee adherence to the provided JSON schema, both for tool calling and model responses.
Here are some direct quotes from the original announcement (archived at Introducing Structured Outputs in the API | OpenAI):
Generating structured data from unstructured inputs is one of the core use cases for AI in today’s applications. Developers use the OpenAI API to build powerful assistants that have the ability to fetch data and answer questions via, extract structured data for data entry, and build multi-step agentic workflows that allow LLMs to take actions. Developers have long been working around the limitations of LLMs in this area via open source tooling, prompting, and retrying requests repeatedly to ensure that model outputs match the formats needed to interoperate with their systems. Structured Outputs solves this problem by constraining OpenAI models to match developer-supplied schemas and by training our models to better understand complicated schemas.
On our evals of complex JSON schema following, our new model gpt-4o-2024-08-06 with Structured Outputs scores a perfect 100%. In comparison, gpt-4-0613 scores less than 40%.
With Structured Outputs, gpt-4o-2024-08-06 achieves 100% reliability in our evals, perfectly matching the output schemas.
From the OpenAI Cookbook:
Structured Outputs is a new capability in the Chat Completions API and Assistants API that guarantees the model will always generate responses that adhere to your supplied JSON Schema.
However, in practice, it seems like strict doesn’t actually guarantee anything—at best, it reduces the likelihood of invalid outputs. If you nudge the model (intentionally or accidentally) to break the schema, it often will—even with simple schemas. With more complex schemas, failures become even more pronounced, a far cry from “perfect 100%” reliability.
Reproducing the issue
For anyone who wants to test this:
-
Fire up the Playground.
-
Add a function with this schema:
{
"name": "searchProducts",
"description": "Search for products.",
"strict": true,
"parameters": {
"type": "object",
"required": ["filters"],
"properties": {
"filters": {
"type": "object",
"anyOf": [
{
"type": "object",
"required": ["homeClub", "operator", "value"],
"properties": {
"value": {
"type": ["string", "null"],
"description": "Home club name."
},
"homeClub": {
"enum": ["homeClub"],
"type": "string",
"description": "The string \"homeClub\"."
},
"operator": {
"enum": ["eq", "ne"],
"type": "string",
"description": "The operator to use for filtering."
}
},
"additionalProperties": false
}
],
"description": "The product filter conditions to apply."
}
},
"additionalProperties": false
}
}
- Use this system prompt, which shows the model an invalid set of arguments for the searchProducts function, nudging it to break the schema:
searchProducts({
"filters": {
"homeClub": "Manchester United"
}
})
- Then, try asking the model for “Manchester United”.
Sometimes you’ll get the desired arguments:
searchProducts{
"filters": {
"homeClub": "homeClub",
"value": "Manchester United",
"operator": "eq"
}
}
… but most of the time, it will break the schema and return something more like the (invalid) example in the system prompt:
searchProducts{
"filters": {
"homeClub": "Manchester United"
}
}
And it’s not just functions where we’re unable to get strict mode to work properly—we have the same problem when using json_schema for the model response format.
Is this everyone else’s experience, too? That strict mode helps (maybe) but does not force the model to adhere to the schema? I’m posting this in the hope that we’re just doing something wrong—because that would be really great news.
We’ve also noticed that in the Chat Completions dashboard, functions that we set to strict: true will appear as strict: false. Maybe just a UI bug, but suspicious nonetheless.
Thanks everyone.