Introducing Structured Outputs

Today we launched Structured Outputs in the API — model outputs now reliably adhere to developer-supplied JSON Schemas. This feature was one of our most highly requested for the API, as managing responses that don’t match your schemas has been a challenge ever since we released function calling.

You can use Structured Outputs in two ways:

  1. When you use function calling, supply strict: true within your function definition to ensure that the model’s output will match your exact schema.
  2. You can now supply a json_schema as a response_format option, along with a full definition of your schema. Set strict: true to ensure that model outputs will reliably match your schema.

Read more about enabling Structured Outputs in the guide.

Structured Outputs with function calling is supported on all models that support function calling, including our gpt-4o, gpt-4o-mini gpt-4-turbo, gpt-4, and gpt-3.5-turbo models.

Structured Outputs with the new response_format param is supported on gpt-4o-mini and our new model: gpt-4o-2024-08-06, released today.

By switching to the new gpt-4o-2024-08-06, developers save 50% on inputs ($2.50/1M input tokens) and 33% on outputs ($10.00/1M output tokens) compared to gpt-4o-2024-05-13.

We’re excited to see what you build with Structured Outputs, happy hacking!

48 Likes

This is epic. This solves a lot of my problems with 2 specific projects me and my teams are running. Also,gpt4o has been saving us some dollars so helpful for bootstrappers like us.

10 Likes

I’m worried about this limitation:

The first API response with a new schema will incur additional latency, but subsequent responses will be fast with no latency penalty. This is because during the first request, we process the schema as indicated above and then cache these artifacts for fast reuse later on. Typical schemas take under 10 seconds to process on the first request, but more complex schemas may take up to a minute.

Do you have any advice for this?

6 Likes

Nice! How is the cost / tokens calculated? Is everything, such as schema, output json etc. tokens as usual?

As an example: if we submit a large schema with a short prompt, majority of the cost will incur from schema. Is that correct?

4 Likes

The schema counts towards input token usage.

4 Likes

Design your architecture so it’s resilient to such latency when it occurs?

It’s not like caching speed up is a new thing :).

3 Likes

Is this new functionality, specifically the new “json_schema” response_format type, already available through azure gpt-4o-2024-08-06 model?

Most likely not yet. I typically check here for updates: What's new in Azure OpenAI Service? - Azure AI services | Microsoft Learn

EDIT: It seems to be available now:
https://azure.microsoft.com/en-us/blog/announcing-a-new-openai-feature-for-developers-on-azure/

Hello, this is a great update to have! I tried to extract logprobs at the same time with a predefined json_schema using the beta chat completion, but it seems it doesn’t return the logprobs:

ChoiceLogprobs(content=None, refusal=None)

1 Like

Hey - great update thanks - i am trying to use the curl example but am getting the following reply “message”: “Unrecognized request arguments supplied: messages, response_format”

It was a copy/paste from the example provided in the documentation so not sure if this is being seen by others.

Hi! - curious why gpt-4o in the API is not pinned to this newest version considering that it is cheaper? Is that going to happen eventually?

1 Like

This is great for my research project, comparing 4o and 4o-mini as base models for an assistant concerning user experience. Structured Outputs with response_format work well with the new 4o model but seems not to work with 4o-mini yet. I contiously get an error when switching from 4o to 4o-mini, without changing anything else. - Anyways, can’t wait to integrate this feature in my other projects soon.

2 Likes

Hey, it normally takes longer for the newest model to be pinned while they work out errors.

3 Likes

Amazing!
What an update! I was looking for this for a very long time.

When using json schema, the reply seems to be pretty-printed, which blows up the token count. Per tiktoken, a recent reply I got was 337 tokens with spaces and newlines and only 201 without

Is there a way to avoid pretty-printing it?

2 Likes

That seems pretty extreme.

Do you have an example you can share?

How was your schema formatted?

Edit:

I can confirm the difference between

{
  "glossary": {
    "title": "example glossary",
    "GlossDiv": {
      "title": "S",
      "GlossList": {
        "GlossEntry": {
          "ID": "SGML",
          "SortAs": "SGML",
          "GlossTerm": "Standard Generalized Markup Language",
          "Acronym": "SGML",
          "Abbrev": "ISO 8879:1986",
          "GlossDef": {
            "para": "A meta-markup language, used to create markup languages such as DocBook.",
            "GlossSeeAlso": [
              "GML",
              "XML"
            ]
          },
          "GlossSee": "markup"
        }
      }
    }
  }
}

and the compact, minified version is about an additional 60% tokens (159:100).

I’m not able to test right now, but it will be useful to see if a minified schema makes any difference.

Edit 2: Incidentally, the same content in YAML[1] is 110-tokens unminified and 107-tokens minified.

I do wonder how the attention mechanism handles minified JSON vs readable JSON vs YAML…


  1. My personally preferred structured format. ↩︎

4 Likes

Sorry, I was confusing I think. I’m not talking about minification in a javascript sense where we make the keys seemingly meaningless but very short. I mean that it’s literally returning multiple spaces and newlines as if it was pretty printing the json.

import json

from openai import OpenAI
import tiktoken

def make_parameters(frogs: list[str]):
    assert len(frogs) <= 25, "OpenAI can only handle 25 frogs at once with this schema"
    messages = [
        {"role": "system", "content": "You are a biologist specializing in aquatics who has been tasked with describing frogs."},
        {"role": "user", "content": "For each frog, please describe its color, temperment, and taste in two words or less. Reply in JSON."},
    ]
    json_schema = {
        "type": "object",
        "properties": {
            v: {
                "type": "object",
                "properties": {
                    "color": {"type": "string"},
                    "temperment": {"type": "string"},
                    "taste": {"type": "string"},
                },
                "required": ["color","temperment","taste"],
                "additionalProperties": False,
            }
            for v in frogs
        },
        "required": frogs,
        "additionalProperties": False,
    }
    
    return {
        "messages": messages,
        "model": "gpt-4o-mini",
        "temperature": 0.1,
        "response_format": {
            "type": "json_schema",
            "json_schema": {
                "name": "summarization",
                "strict": True,
                "schema": json_schema,
            }
        }
    }

client = OpenAI()
parameters = make_parameters(["green tree frog", "poison arrow frog", "peeper", "bullfrog"])

encoder = tiktoken.encoding_for_model(parameters["model"])
answer = client.chat.completions.create(**parameters)

message = answer.choices[0].message.content
local_output_tokens = len(encoder.encode(message))

print("encoder is valid?", local_output_tokens == answer.usage.completion_tokens)
print("with spaces:", local_output_tokens)
print("without whitespace:", len(encoder.encode(json.dumps(json.loads(message), separators=(",", ":")))))

When I ran this, the completion contained 141 tokens including whitespace that’s irrelevant to the JSON’s meaning (middle print), or 87 when I stripped it (last print). I am now noticing that some other prompts are less biased towards adding irrelevant whitespace, but it seems like it would be in OpenAI’s interest to (at least internally) change their token constraints to not generate irrelevant whitespace.

No, I understood. Perhaps I was confusing? LOL…

Minification of JSON, to the best of my knowledge, is interchangable with compactification—single line, no whitespace outside of string literals.

I agree—it could be in everyone’s interest if it returned compact JSON.

2 Likes

A post was split to a new topic: Why is the ‘Beta’ object missing the chat attribute?