Structured Outputs tokens and latency

Dobo · August 9, 2024, 2:36pm

Hi,

Structured outputs is a very welcome new feature, and something we have been looking forward to for a while.

Our app, Onsen (onsenapp dot com), an AI companion for mental health, makes extensive use of JSON prompts to dynamically generate its UI. Until now we have used response_type=json_object extensively, and we are excited at the new opportunities that structured outputs bring.

Two questions / concerns:

1) Token use

I have seen it being mentioned elsewhere, that the JSON schema definition would count towards the input token usage. However, I am unable to show this.

For example, I am adding a large complex JSON schema with a single string with very long enum values (thousands of characters), and my prompt usage stats do not show any increase in tokens.

As it seems, either the JSON schema is applied for “free”, or the AI is not correctly reporting the JSON schema tokens in the usage metrics.

2) Latency

The docs says that the first time it runs, a prompt with structured outputs would take extra time to create some artefacts (presumedly cache the JSON schema artefacts in a memory db somewhere).

However, I am also seeing significant increase in latency for repeat prompt runs as well. For example a prompt that takes ~3 seconds to run with plain json_object type consistently takes ~4-5 seconds with the new structured output json_schema.

Are others experiencing this?

Can OpenAI comment on this undocumented additional latency?

anon10827405 · August 9, 2024, 2:51pm

Probably not. Not trying to be mean, it’s just mostly they work silently on these things.

I’m sure this would’ve been a reported feature.

You are constraining the model to specific grammar & options. Although we don’t see behind the scenes it’s expected to (very slightly) longer the more complex the structure. I’m surprised to see it take longer than json_object though. Strange.

I would recommend checking out open source variations that OpenAI used. They offer more steerability, depth, and understanding.

Topic		Replies	Views
Quality of response between gpt-4-1106-preview and gpt-4o API gpt-4 , openai , gpt-4o	14	1118	September 11, 2024
Use file with text-davinci-001 to increase tokens in prompt Prompting	13	2636	December 15, 2023
How to reduce OpenAI response time? API	13	18009	December 13, 2023
Token limits on prompting Prompting plugin-development	4	2526	June 16, 2023
Feature request: token injection during streaming for structured output generation API	4	1355	May 17, 2023

Structured Outputs tokens and latency

Related topics