I have no problem with the 30k tpm but what does the 90k tpd mean can I really only have 90k tokens done per day?
The last column in usage tier-1 shown here is batch queue limit.
That is the most you can send at once to await processing by the batch endpoint, the maximum depth of the jobs waiting to be done, where a JSONL file of multiple API requests is done in off-time, and a file with the API model results is available for download after they are processed, at a 50% discount.
Since the batch API can have up to a 24 hour turnaround, it could mean that you can plan for no more than 90k to be done in a day unless you poll and monitor automatically to see when you can make the next submission.
You will be immediately blocked from sending any input larger than 30k to the gpt-4o model by API, or can get blocked if using Assistants: after paying for internal turns which can make multiple calls to models that use file search and load 16k tokens of document retrieval into a context (besides a growing chat length).
Model | RPM | RPD | TPM | Batch Queue Limit |
---|---|---|---|---|
gpt-4o | 500 | - | 30,000 | 90,000 |
gpt-4o-mini | 500 | 10,000 | 200,000 | 2,000,000 |
gpt-4o-realtime-preview | 100 | 100 | 20,000 | - |
o1-preview | 500 | - | 30,000 | 90,000 |
o1-mini | 500 | 10,000 | 200,000 | 2,000,000 |
gpt-4-turbo | 500 | - | 30,000 | 90,000 |
gpt-4 | 500 | 10,000 | 10,000 | 100,000 |
gpt-3.5-turbo | 3,500 | 10,000 | 200,000 | 2,000,000 |
omni-moderation-* | 500 | 10,000 | 10,000 | - |
text-embedding-3-large | 3,000 | - | 1,000,000 | 3,000,000 |
text-embedding-3-small | 3,000 | - | 1,000,000 | 3,000,000 |
text-embedding-ada-002 | 3,000 | - | 1,000,000 | 3,000,000 |
whisper-1 | 500 | - | - | - |
tts-1 | 500 | - | - | - |
tts-1-hd | 500 | - | - | - |
dall-e-2 | 500 img/min | - | - | - |
dall-e-3 | 500 img/min | - | - | - |