Anomalous "TPM" rate for gpt-3.5-turbo - tier 2 higher than tier 3, many more limit issues

gpt-3.5-turbo category with its model details page

Tier 2 gets 2 million, but tier 3 gets 0.8 million?

(I can confirm the last figure is OK..)

Seems either the documentation is wrong, or delivered tier 3 is a demotion…

Same issue with the 16k model: https://platform.openai.com/docs/models/gpt-3.5-turbo-16k-0613

chatgpt-4o-latest

This model has strictly curtailed usage, but its own model page does not reflect that:

t5- Documentation says 30_000_000TPM; 10_000RPM; actual org:

image

(gpt-5-chat-latest is not “swiftly curtailed”…)


gpt-4o-audio-preview-2025-06-03 - default rate

More issues: organization has demoted gpt-4o-audio-preview-2025-06-03 rates:

The model page for “audio” does not state that separate diminished “all other models” rate being delivered to the org, but has its own issue with batch for tier 4:

…I’m going down the line items of models endpoint models, and not even at the halfway point…

gpt-4.1 “long context”

Dispensing of tiers is irregular:

normal:


tier 2: 450_000 => 500_000 = +10%
tier 3: 800_000 => 1_000_000 = +20%
tier 4: 2_000_000 => 5_000_000 = +250%
tier 5: 30M => 10M = - 66%

long context “mode” detected on your input by the rate limiter (that can’t see images or know PDFs)

Tier 1 rates are made even sillier here. For a single input:

  • 0-30000:OK;
  • 30001-mystery cutoff: fail;
  • cutoff-200000:OK again.

gpt-4o-mini-search-preview

Wrong rates documented vs delivered by API limits

The mini search model has the same rates as normal “mini”

However, in the organization, we can see that search limits are held-out at a different, much-lower rate than gpt-4o-mini, likely the same for all tiers - and no batching:

1 Like

Hey @_j ,

Working on these - thank you so much for taking the time to document!

  1. gpt-3.5-turbo Tier 3 should get 4M TPM, update in progress.
  2. gpt-3.5-turbo-16k-0613 is no longer in use, so we’ll remove this page entirely.
  3. Will share updates on the rest as we go.
1 Like

Yes: you should note appropriately that gpt-3.5-turbo-16k as an alias is still operational, however the whole point of the model is broken by the 4k output constraint of where it is now pointed:

"model": "gpt-3.5-turbo-0125"

I just spotted more rate limit provisioning issues:

What new audio models should be at:

What is actually seen for my tier-5 org is the “all other models” defaults:

Same situation for realtime.

What new realtime models should be at:

What is provisioned:

This is likely systematic and system-wide, needing addressing.

Shared with the docs team. Thank you for the reports, @_j ! :raising_hands:

1 Like

Update for my own org rate limits, as an indicator of what may be global as tier limits that are contrary to documentation and common sense - where the API needs to be fixed, not the documentation:

  • Corrected: There is a new group pool called “gpt-realtime” that has all models listed under it, with updated and correct rate limits.

  • Bad: gpt-audio and mini models still stand alone under the top-level “chat” category, and are assigned default limits, far below tier documentation.

  • Bad: o3-deep-research-2025-06-26, along with gpt-4o-transcribe-diarize, gpt-4o-audio-preview-2025-06-03 is also still getting the default rate.

  • Bad: even babbage-002 and davinci-002 have rapidly-increasing tier limits in those model documentation rates, but get “default”.

  • Bad: chatgpt-4o-latest no change has be affected - that documentation says tier 1 is 500RPM; I have 200RPM.

Additionally, the default “all other models” limits per tier are not documented other than in an actual organization limit screen (at the bottom), and also, these default rate limits seem nearly the same from tier-1 to tier-5, and with low limits like 250,000TPM assigned to tier 5 premium models, should be considered application-breaking.

Action is needed to correct organizations still.