Structured Output - Why does creating a CFG take a decent amount of time?

Introducing Structured Outputs in the API

The first API response with a new schema will incur additional latency, but subsequent responses will be fast with no latency penalty. This is because during the first request, we process the schema as indicated above and then cache these artifacts for fast reuse later on. Typical schemas take under 10 seconds to process on the first request, but more complex schemas may take up to a minute.

I am wondering why generating the context free grammar for a json schema takes a non negligeable amount of time. I was expecting it to be relatively quick since it “just” forces the LLM to produce certain keys in order and guarantee valid json.

For a class Joke(setup: str, punchline: str), I am picturing a CFG that looks like this:

Joke    → "{" Setup "," Punchline "}"
Setup   → '"setup":' String
Punchline → '"punchline":' String

(instead of thinking about how the keys can be divided into tokens and making the llm generate them, we can just insert the keys directly in the text mid generation)

What would be an example JSON schema that takes 1 minute to process? Is it mostly when the value is an enum instead of string that can be anything (unrestricted)?