OpenAI has introduced a “Priority” service tier that comes with a higher price but promises latency SLAs for API requests. We have started using it in production, but I have encountered a key issue:
There’s no clear way to verify or measure that the latency SLA is actually being met.
Here’s what I’ve found so far:
The API responses don’t include latency metrics.
The usage dashboard doesn’t expose granular latency metrics either.
If OpenAI is going to offer SLA-backed latency tiers, there should be transparent, observable metrics to validate this. Otherwise, it feels like a black box — you’re paying extra for something you can’t audit.
Has anyone figured out a way to accurately measure latency under the new Priority tier?
Are there plans from OpenAI to expose this information via headers or telemetry?
Hey David, thanks for the feedback! I work on the Platform product team here.
Whether you have a latency SLA when using Priority processing depends on the type of contract you have with OpenAI (typically limited to larger customers working with our sales team). If you’re eligible for the latency SLA, we are working on a new dashboard that will surface the relevant metrics!
Thank you very much for your prompt response. I truly appreciate your attention to this matter and look forward to having the new dashboard as soon as possible.
You essentially would have to run all API requests through you own API proxy or edge for logging, catching either usage objects and the response time, or better, logging streaming requests. You can then extract and extrapolate the 5 minute interval total average latency for a model when using the parameter.
Then the priority queue is meted out based on your application being rather continuous usage. If the previous metered period had you at 200k TPM and then you went beyond 300k in 15 minutes, you could have requests fall back to standard priority and thus no TPM SLA.
That makes matching your algorithm to theirs hard because it is under-described. The 5 minute period (of sliding or set time?) would allow plenty of variance and low response times if averaged by some faster production rate calls.
The important thing to qualify by OpenAI is if this parameter works at all to provide anything different if not in a scale-tier eligible enterprise with annual spend commit, or if denied until unlocked, or if it simply takes your additional payment to do nothing.