Reading through the priority process document, had one question regarding the capacity, can anyone help clarify this?
Assuming if my requests are always within ramp rate limits, and the traffic keep growing, will OpenAI always guarantee that it will have enough capacity to handle my requests in Priority tier? or my requests still might be downgraded to standard tier when the overall Priority tier capacity is not enough?
If (a) Priority processing performance is degraded AND (b) a customer’s traffic is ramping too quickly, then some Priority requests may be downgraded to Standard processing instead.
It does not say “OR performance is degraded”. If you literally read the statement, then the condition “some Priority requests may be downgraded” cannot be reached by only degraded performance and not also an organization over-ramp (to hint at provisioning needs), and thus would still fall under SLA (which if not met is all of “sorry, our bad, contact sales rep”).
The excess rate is not by day or something reasonable, it is by continuous use by 15 minutes and is violated if you increase by 50% in 15 minutes (vs unmentioned metering period). Starting at 100k TPM to consider to drop you out. So guaranteed good for the speed of barely a few users with large input before this ramping factors in. Comparable to about 3 scale units.
So this does not tolerate any burstyness or variance and you would have to have many users, but the penalty is then just a cheaper standard request.
Overall: this is a price/profit increase for OpenAI having the same compute resources.
I expect you are mostly interested in edge cases, like a constantly growing demand for a extended period of time. I suggest to reach out to sales@openai.com or your account manager directly because the general marketing page doesn’t have all the answers.
In general I read it as: if everything else stays the same and the ramp limits are not hit, priority processing will commence as expected.