Here is a screen shot of the documentation page for GPT 4.1. There is a toggle on the page for “Long Context” that changes the TPM value from 30k to 200k. Unless this is a documentation error, how does one access the higher limit?
“For long context models like GPT-4.1, there is a separate rate limit for long context requests.”
Long context is requests with inputs estimated to have >128k input tokens (the edge rate limiter only performs an estimate)
That means that there is an irregular gulf of input length where your request will fail because the amount of input is greater than the TPM, but you have not triggered the “long context”. For the same model with a 30000 TPM limit normally on tier-1, and a ‘reset’ limiter history:
0-30000: OK
30000-127999: FAIL
128000-200000: OK
200000+: FAIL
I also note that OpenAI has completely removed the centralized documentation of the per-tier limits you can expect by model from the rate limit docs . One must now tediously explore individual model pages, which also have no presentation of the higher “large context”. That must be restored for easy transparency and organization planning.
You can either:
- Pad input data with more messages or documentation if they are individually in this TPM failure range and are refused because of rate limit.
- increase tier level: more than seven days after the first payment, make another payment bringing the total paid above $50 for tier-2.
450000TPMfor normal requests against that model. - Use GPT-5 models, which have higher starting TPM for tier-1 (that have not been “backported”).
I was considering the tier bump as the fastest way to a solution. It needs to be 4.1. I could actually pull together enough relevant content to cross that gulf (128k>) but the time required to curate it well enough so it aligns with what I’m aiming at means doing the $50 thing probably makes a lot more sense. Especially as this isn’t a long-term architecture, but for some experiments I’m running.
Thanks. I had remembered seing something about the 128k line, but had trouble finding if it was just a question of having content that large that automatically responds to the higher rate limit.
