Rate limit discrepancies across finely tuned models

mdoremus · April 22, 2025, 4:24pm

Hello! I’m wondering if someone can clear up some confusion I have over finely tuned models I’m using.

I am currently experimenting with two new finely tuned models I created. The first is based on gpt-4o-mini-2024-07-18, the other, gpt-4.1-mini-2025-04-14. For context, my organization is on Tier 3.

After jumping from Tier 1 to Tier 3, I no longer had the per minute issues I was seeing with the finely tuned model on gpt-4o-mini. However, after tuning a new model with gpt4.1-mini, it seems I’m running into the default token limit (250000 TPM and 3000 RPM) and almost immediately hitting these limits.

I’m not sure whats going on here—do I just have to wait for something to kick in on openai’s side before I’m able to more easily use the finely tuned model on gpt4.1? Why is it hitting the default when the rate limit page says both mini models should have the same limits? Any help or insight would be appreciated, thank you!

Topic		Replies	Views
Rate limit issue, very confused with results API	4	2780	December 22, 2023
Token/Tier Limits for account API gpt-4	0	227	December 2, 2024
GPT4 rate limit got lower? Bugs rate-limit	1	126	November 13, 2024
RPM rate limits at 60 when using gpt-4 with Assistant API API api	3	1344	February 28, 2024
Different gpt-4 level models to mitigate rate limiting issues API	6	1073	September 13, 2023

Rate limit discrepancies across finely tuned models

Related topics