That’s not a useful reply. If the latency is 10 seconds to 1 minute, there isn’t really a way to design for that. What we need to understand is how the caching is performed, so we know how often the penalty is incurred. I’ve asked in several places for more detail, but so far no response:
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Structured Outputs Deep-dive | 44 | 19326 | July 2, 2025 | |
| Response has valid json but it's nested in broken json | 17 | 4385 | December 26, 2025 | |
| Getting response data as a fixed & Consistent JSON response | 43 | 130152 | February 19, 2024 | |
| Streaming using Structured Outputs | 22 | 17273 | August 12, 2025 | |
| Structured Outputs & Functions - Schema-Writer Playground AI Preset to make them | 13 | 2671 | November 1, 2024 |