The tokens per minute of tier 1 was slashed from 300k to 30k for gpt-4o
and gpt-4-turbo
.
This means a request to a 128k model with input tokens 28k+ with a reasonable max_tokens will fail.
Assistants with retrieval or file_search also blows past this limit, with user failures also reported.
A user can’t spend $0.15 unless having paid up over $50, a wait, and pay again to recalculate?
Tier 1 can’t even queue a single 100k call overnight.
30000 vs 10 million per minute for tier-5 has nothing to do with trust or server load.