They aren’t going to directly write “We took a whole bunch of accounts that were low-value and put them into an API filter buffer that simulates slow output to decrease their satisfaction. Goal: get them to pay more to return to normal.”
OpenAI rewrote the text on the “rate limits” page to:
“Organizations in higher tiers also get access to lower latency models.”
Previously: " “As your usage tier increases, we may also move your account onto lower latency models behind the scenes.”"
Lower latency “models” makes no sense. Why leave your generation in an overloaded time-sliced server when it is more efficient to generate 100 tokens a second and then the unit processor is freed for another user. Only if there was no way for them to generate the current customer load without hiring slow energy-inefficient GPU instances of older technology.