"Long Context" on GPT 4.1 for Tier 1 shows TPM of 200K. How do we access that "Long Context"

brosten.consulting · January 12, 2026, 5:20am

Here is a screen shot of the documentation page for GPT 4.1. There is a toggle on the page for “Long Context” that changes the TPM value from 30k to 200k. Unless this is a documentation error, how does one access the higher limit?

_j · January 12, 2026, 6:07am

“For long context models like GPT-4.1, there is a separate rate limit for long context requests.”

Long context is requests with inputs estimated to have >128k input tokens (the edge rate limiter only performs an estimate)

That means that there is an irregular gulf of input length where your request will fail because the amount of input is greater than the TPM, but you have not triggered the “long context”. For the same model with a 30000 TPM limit normally on tier-1, and a ‘reset’ limiter history:

0-30000: OK
30000-127999: FAIL
128000-200000: OK
200000+: FAIL

I also note that OpenAI has completely removed the centralized documentation of the per-tier limits you can expect by model from the rate limit docs . One must now tediously explore individual model pages, which also have no presentation of the higher “large context”. That must be restored for easy transparency and organization planning.

You can either:

Pad input data with more messages or documentation if they are individually in this TPM failure range and are refused because of rate limit.
increase tier level: more than seven days after the first payment, make another payment bringing the total paid above $50 for tier-2. 450000TPM for normal requests against that model.
Use GPT-5 models, which have higher starting TPM for tier-1 (that have not been “backported”).

brosten.consulting · January 12, 2026, 6:32am

I was considering the tier bump as the fastest way to a solution. It needs to be 4.1. I could actually pull together enough relevant content to cross that gulf (128k>) but the time required to curate it well enough so it aligns with what I’m aiming at means doing the $50 thing probably makes a lot more sense. Especially as this isn’t a long-term architecture, but for some experiments I’m running.
Thanks. I had remembered seing something about the 128k line, but had trouble finding if it was just a question of having content that large that automatically responds to the higher rate limit.

Topic		Replies	Views
30K Tokens Per Minute limit vs 128K+ Context Models – Is Long-Context Usage Actually Possible via API? API	3	144	November 23, 2025
Context length vs. Token Limits API gpt-4 , api	1	2727	July 11, 2024
Gpt-4-1106-preview in Playground needs some fixes API gpt-4 , playground	24	17362	February 5, 2024
Test new 128k window on gpt-4-1106-preview API	29	18726	February 6, 2024
Gpt-4-1106-preview Context Length? API	1	6785	November 9, 2023

"Long Context" on GPT 4.1 for Tier 1 shows TPM of 200K. How do we access that "Long Context"

Related topics