Documentation issue: Structured Outputs implies NO initial delay except for fine-tuning models

_j · May 23, 2025, 10:03pm

If one searches any of structured outputs or function calling documentation, this is the ONLY mention that appears about the initial latency of setting up a context-free grammar:

Specifically for fine tuned models:

Schemas undergo additional processing on the first request (and are then cached). If your schemas vary from request to request, this may result in higher latencies.

Why would it mention fine tuned model (besides the fact that they are slow to run when cold?)

The CFG construction latency delay is real and impactful on the fastest model OpenAI’s got, fine-tuning or not.

Showing below: 10 seconds to get “hello” from gpt-4.1-nano with six functions:

The documentation downplays any mention of this.

Also not mentioned is the lifetime of a cache of this structured output enforcement - and if it shares similar server instance cache hash method as just clarified in documentation for the context window caching discount, or if it is a more persistent method. Then, the scope of this persistence across projects, across organizations.

_j · May 24, 2025, 1:06pm

Just to investigate the persistence of structured outputs artifact for functions, the exact request pictured above I had saved as a Playground preset, simply to run it again. Well, 15 hours later, seven weekend morning seconds needed to get four tokens of output:

A following API call was 0.5 seconds.

Topic		Replies	Views
Structured Outputs tokens and latency API token , json , json-mode	1	2700	August 9, 2024
Fine-tuned gpt-3.5-turbo latency Feedback fine-tuning-problems	15	3828	November 15, 2024
Structured Output: Caching and Latency API	5	2166	March 24, 2025
Structured Output - Latency due to using the CFG API	0	69	February 26, 2025
How long OpenAI keep the cached converted schema for structured output API structured-output	0	220	October 24, 2024

Documentation issue: Structured Outputs implies NO initial delay except for fine-tuning models

Related topics