I have batch requests (using gpt-4o-mini). But I see a batch API limit of 2M tokens per day (TPD). I have been trying to find ways to request an increase in the limit. I have a task that I want to run which potentially uses about 25M tokens. I don’t want to wait around for 23-24 days to run the task. I don’t see any option to request this on the organization/limits page or the project/limits page.
I am wondering if there is a way to request an increase, if only on a temporary basis since I will not be needing this capacity after the task if complete.
The limit is actually the maximum number of tokens that can be enqueued at a time (amount waiting).
Turnaround is under 24 hours, sometimes far less (and sometimes cancelled at 24 hours). So you could potentially push more through in a day by watching small jobs complete.
You can prepay your way up to a higher “tier”, the only request method being “sending unrefundable money”. $50 total paid, with the most recent payment more than 7 days after the first. Not exactly the “fair” that is “ensured” when you’re talking about under $10 total:
Rate limits ensure fair and reliable access to the API by placing specific caps on requests or tokens used within a given time period. Your usage tier determines how high these limits are set and automatically increases as you send more requests and spend more on the API.
The documentation is a fib about “send more requests”. You don’t have to make an API request, just a series of payments.
gpt-4o-mini
Tier
RPM
RPD
TPM
Batch queue limit
Free
3
200
40,000
-
Tier 1
500
10,000
200,000
2,000,000
Tier 2
5,000
-
2,000,000
20,000,000
Tier 3
5,000
-
4,000,000
40,000,000
Tier 4
10,000
-
10,000,000
1,000,000,000
Tier 5
30,000
-
150,000,000
15,000,000,000
You could also contract the batch services with an organization capable of 1000x that of tier 1.
Thanks a ton @_j I eventually paid up and moved to tier 2. Now I can complete this task in 2 days. I wish they documented this table was somewhere public. I am not sure if the limit is on the number tokens enqueued. I had exhausted the 2MM token limit and had not batches enqueued and yet my subsequent results on the same day kept failing with the reason being the “token_limit_exceeded”. I think the reset happens at 12:00 am (I am not sure if it’s UTC or PST).
Anyway, this works for me as I intend to use up those credits (and much more) very soon anyway.
If you look at the quirky tier rate positioning, and aren’t seeking a discount beyond the possibility of a cache hit, you see that you also can be done in 15 minutes with a stream of individual API calls.