Structured Outputs tokens and latency

Hi,

Structured outputs is a very welcome new feature, and something we have been looking forward to for a while.

Our app, Onsen (onsenapp dot com), an AI companion for mental health, makes extensive use of JSON prompts to dynamically generate its UI. Until now we have used response_type=json_object extensively, and we are excited at the new opportunities that structured outputs bring.

Two questions / concerns:

1) Token use

I have seen it being mentioned elsewhere, that the JSON schema definition would count towards the input token usage. However, I am unable to show this.

For example, I am adding a large complex JSON schema with a single string with very long enum values (thousands of characters), and my prompt usage stats do not show any increase in tokens.

As it seems, either the JSON schema is applied for “free”, or the AI is not correctly reporting the JSON schema tokens in the usage metrics.

2) Latency

The docs says that the first time it runs, a prompt with structured outputs would take extra time to create some artefacts (presumedly cache the JSON schema artefacts in a memory db somewhere).

However, I am also seeing significant increase in latency for repeat prompt runs as well. For example a prompt that takes ~3 seconds to run with plain json_object type consistently takes ~4-5 seconds with the new structured output json_schema.

Are others experiencing this?

Can OpenAI comment on this undocumented additional latency?

4 Likes

Probably not. Not trying to be mean, it’s just mostly they work silently on these things.

I’m sure this would’ve been a reported feature.

You are constraining the model to specific grammar & options. Although we don’t see behind the scenes it’s expected to (very slightly) longer the more complex the structure. I’m surprised to see it take longer than json_object though. Strange.

I would recommend checking out open source variations that OpenAI used. They offer more steerability, depth, and understanding.