Can we get some clarification on tiers and latency?

GoldenJoe · October 31, 2024, 2:37am

While benchmarking the Assistant API and searching for possible causes of latency, I found another thread that suggests latency is tied to tier.

The rate limits page indicates that your tier limits the amount of tokens that can be processed in a given timeframe, but nothing about latency.

However, there is a separate page for a scale tier that promises more reliable latency. I actually found this page through a google search; I’m not actually sure what it is or where it’s supposed to be found on the OpenAI website.

Can someone from OpenAI confirm whether or not latency is affected by your tier?

_j · October 31, 2024, 4:21am

It doesn’t seem to be reported as much as it was a year ago by users, when OpenAI made this dark pattern of degrading low-paying accounts even before they announced any tier system.

That “latency” language was stricken from documentation even when still affecting organizations.

There haven’t been many complaints recently by API users of targeted lower production rates. With “99% less expensive AI (for us) than two years ago” just spoken today at devday, there is likely less pressure to shuffle off particular developers to low-performance servers.

Which is good, as “pay the same per call as the rich, get less” is not a good company policy.

Scale tier is for buying $5000 increments of dedicated compute.

Topic		Replies	Views
API Rate limits about response API api	3	1383	April 18, 2024
OpenAI API processing time API	5	1931	November 16, 2023
Access to API Scale Tier for Production Deployment API api , enterprise	8	180	May 22, 2025
Are folks noticing improved FineTuned model latencies with Tier 5 vs Tier 4? API fine-tuning	1	82	January 16, 2025
Why is the speed of using API faster for free trial accounts than for paid accounts? API gpt-4 , gpt-35-turbo	19	4174	November 18, 2023

Can we get some clarification on tiers and latency?

Related topics