BatchAPI is now available

Batch queue limits depend on the tier you are in. If 90,000 tokens are your limit that suggests that you may be in Tier 1 at the moment?

Under the usage tier documentation you can see the limits by Tier and model:
https://platform.openai.com/docs/guides/rate-limits/usage-tiers?context=tier-one

As for your other point, at any point in time you can have no more than the stated limit in the queue for batch processing. That’s just the way the system is designed.

1 Like

Is gpt-4o supported on the batch API?

yes, it is supported as well.

Thanks for response; do you mean I hit TPM (token per minute) limit, accidentally, by submitting 2nd batch too soon?

Right now I am at tier 3 already which 600,000 TPM and my 200,000-token batches work without errors

I am referring to the batch queue limit which is a separate limit:

image

Source: https://platform.openai.com/docs/guides/rate-limits/tier-3-rate-limits

You should be a developer already familiar with using the chat completions endpoint, and by having made requests in JSON direct to the API instead of through a library module.

If you are unfamiliar with how to create a “chat” manually, you will not have success in processing many of them into a specially-formatted file where each line is a job to be performed.

Thanks; something changed there, it is peak hours in America right now, and it is exceptionally fast, 600,000 tokens (via 6 files, 100 tasks each) takes about 5 minutes. I am at Tier 3; hitting gpt-4-turbo.

1 Like

Hello @jeffsharris
Thanks for the batch API. One question: how do you deal with long response (over 4096 tokens)? When I do one call by API, I just launch a second call with “continue” as user text (and pass first the response as assistant text).
And with Batch API? Thank you.

Feedback: be able to delete batch jobs.

So many of mine have failed in testing due to batch queue limits that running client.batches.list() and doing list comprehension to remove failed batches takes more than 10 seconds.

Hello @jeffsharris @jr.2509

What are default values of max_tokens and temperature if not provided in .jsonl file?

For example for model gpt-4o-2024-11-20 will it default to 16,384 as it is max token output for this model?

You can assume that the same principles apply as for regular chat completions, i.e. the upper boundary for max output tokens applies - which is 16,384 tokens as you have rightly pointed out - and the temperature defaults to 1.

As for max output tokens - just to be clear - the max value is unlikely to be reached even if no specific limit is set.

1 Like

Thank you for your answer.

Just to double check. When max output tokens is not set explicitly, does it mean it defaults to maximum possible value?

1 Like

Yes and no. If you don’t specify the max output token specifically, it simply means that there is no hard stop for the number of tokens generated. The amount of output tokens returned is highly dependent on the nature of your task and your prompt. You rarely - if ever - will in practice reach the maximum of 16,384 tokens.

Does that make sense?

2 Likes

I see, it make senses. Without setting hard stop, as you mentioned it will fully be depended on task and prompt. However upper limit will be 16,384, at least theoretically as it can be even impossible to reach it.

Is my interpretation correct?

2 Likes

Yes, that’s basically correct.

1 Like

It will be impossible to reach it with purposeful intelligent output. Due to purposeful training against lengthy desired production.

However, it can be reached when the AI goes into a never-ending loop of producing patterns. Thus the original safety factor - where completions had no shutoff method at all except your own stop token (and for a chat model to stop, the AI must output OpenAIs ChatML container stop token, by training).

1 Like

@_j

In this case are we able to define what is OpenAIs ChatML container stop token value? Most of parameters in OpenAI documentations if are not required, have default values, but not max_tokens or new max_completion_tokens. Does it fully depends on user prompt and it can not be defined?

The stop token is the end of turn that ceases the output. It is a high token number like 100260 or 100265, but you never receive this as a stop sequence is not included in the response. It is written by the AI as a “I’m done responding, thought over”.

You can add more stop sequences as an API parameter “stop”, for example adding four linefeeds or four tabs if you want to limit JSON mode’s ability to go crazy. Or just a period to ensure you get nothing more than a sentence (where you have to add your own period).

1 Like