Structured Output - Why does creating a CFG take a decent amount of time?

bpechounov · February 26, 2025, 3:00pm

Introducing Structured Outputs in the API

The first API response with a new schema will incur additional latency, but subsequent responses will be fast with no latency penalty. This is because during the first request, we process the schema as indicated above and then cache these artifacts for fast reuse later on. Typical schemas take under 10 seconds to process on the first request, but more complex schemas may take up to a minute.

I am wondering why generating the context free grammar for a json schema takes a non negligeable amount of time. I was expecting it to be relatively quick since it “just” forces the LLM to produce certain keys in order and guarantee valid json.

For a class Joke(setup: str, punchline: str), I am picturing a CFG that looks like this:

Joke    → "{" Setup "," Punchline "}"
Setup   → '"setup":' String
Punchline → '"punchline":' String

(instead of thinking about how the keys can be divided into tokens and making the llm generate them, we can just insert the keys directly in the text mid generation)

What would be an example JSON schema that takes 1 minute to process? Is it mostly when the value is an enum instead of string that can be anything (unrestricted)?

Topic		Replies	Views
Structured Output - Latency due to using the CFG API	0	69	February 26, 2025
Documentation issue: Structured Outputs implies NO initial delay except for fine-tuning models Documentation openai-documentation	1	103	May 24, 2025
How long OpenAI keep the cached converted schema for structured output API structured-output	0	220	October 24, 2024
Structured Output: Caching and Latency API	5	2166	March 24, 2025
Structured Outputs tokens and latency API token , json , json-mode	1	2700	August 9, 2024

Structured Output - Why does creating a CFG take a decent amount of time?

Related topics