I was submitting two batches of 50k requests each for gpt-4.1-mini using the Batch API. One of these led to “31675 completed, 18325 failed of 50000 total requests”. Each of the failed requests had the error message: “You’ve exceeded the 1000 request/min rate limit, please slow down and try again.”.
First of all, my organizations gpt-4.1-mini RPM limit is 10,000.
Second of all, I believe it should be the responsibility of OpenAI to space out the requests so that a RPM limit is not hit. The Batch API is able to take up to 24h anyways (as specified in the completion time argument!).
Hmmm…
While your organization’s overall GPT-4.1-mini RPM limit is 10,000, the Batch API enforces a separate, stricter rate limit of 1,000 requests per minute per batch. This limit applies regardless of your total RPM quota.
The completion_time parameter specifies the maximum allowed time for the entire batch to complete (up to 24 hours), but it does not control the rate at which requests are sent to the API. It’s the client’s responsibility to throttle the requests accordingly if the rate limit is exceeded, the API returns the “exceeded the 1000 request/min rate limit” error.
To avoid this issue, you maybe should:
-
Spread out the 50,000 requests over at least 50 minutes (50,000 ÷ 1,000 RPM).
-
Implement rate limiting or exponential backoff on your side, respecting the Retry-After header.
-
Or contact OpenAI support to request a higher batch rate limit…
@Marcin_B The user cannot control how fast OpenAI is making the requests with the Batch API. Therefore, I don’t think it’s the users responsibility when a RPM limit is hit.