Following are taken from OpenAI docs . I’m not able to understand the enqueued-prompt limits part . Can someone explain it simply with an example if possible ? Also the rate limit docs (https://platform.openai.com/docs/guides/rate-limits/usage-tiers) does not mention any limit on number of tokens in each batch request . so basically there is no such limit ?
Rate Limits
Batch API rate limits are separate from existing per-model rate limits. The Batch API has two new types of rate limits:
- Per-batch limits: A single batch may include up to 50,000 requests, and a batch input file can be up to 100 MB in size. Note that
/v1/embeddings
batches are also restricted to a maximum of 50,000 embedding inputs across all requests in the batch. - Enqueued prompt tokens per model: Each model has a maximum number of enqueued prompt tokens allowed for batch processing. You can find these limits on the Platform Settings page.
There are no limits for output tokens or number of submitted requests for the Batch API today. Because Batch API rate limits are a new, separate pool, using the Batch API will not consume tokens from your standard per-model rate limits, thereby offering you a convenient way to increase the number of requests and processed tokens you can use when querying our API.