While benchmarking the Assistant API and searching for possible causes of latency, I found another thread that suggests latency is tied to tier.
The rate limits page indicates that your tier limits the amount of tokens that can be processed in a given timeframe, but nothing about latency.
However, there is a separate page for a scale tier that promises more reliable latency. I actually found this page through a google search; I’m not actually sure what it is or where it’s supposed to be found on the OpenAI website.
Can someone from OpenAI confirm whether or not latency is affected by your tier?
It doesn’t seem to be reported as much as it was a year ago by users, when OpenAI made this dark pattern of degrading low-paying accounts even before they announced any tier system.
That “latency” language was stricken from documentation even when still affecting organizations.
There haven’t been many complaints recently by API users of targeted lower production rates. With “99% less expensive AI (for us) than two years ago” just spoken today at devday, there is likely less pressure to shuffle off particular developers to low-performance servers.
Which is good, as “pay the same per call as the rich, get less” is not a good company policy.
Scale tier is for buying $5000 increments of dedicated compute.