Structured Output Latency - How is Caching done?

From the release (paraphrased):

The first request will incur a latency penalty of 10 sec - 10 min, depending on schema complexity. Subsequent requests will not pay this penalty as the schema will be cached

Is the caching done at the API Key + Schema level, such that every schema requested for a given API Key will be “permanently” saved and all future requests, regardless of how far apart they are in time, will benefit from this caching? This implies persisting cached schemas for each API key.

If not, what is meant by “subsequent requests will not pay the latency penalty”? How are subsequent requests defined? Is it session based? Some other way?

Thanks
-Mark

4 Likes