BatchAPI is now available

The amount of job you can submit is also limited by the daily queue rate limit for batch:
This qpd list I keep updated with tier 0 (free) through tier 5:

    "class": "gpt35turbo",
    "id": [
    "qpd": [200000, 200000, 400000, 10000000, 100000000, 300000000]
    "class": "gpt4turbo",
    "id": [
    "qpd": [0, 900000, 1350000, 40000000, 80000000, 300000000]

The limit reported is close to tier-3 and tier-4 tokens for 3.5, but doesn’t align with any gpt-4-turbo tier (unless your account has different rate limit than published). The error message doesn’t indicate the unit being measured.

It may be that you have submitted other batch jobs within the daily window of queue rate consideration?

(looks like tier 5 got a QPD boost for gpt-4-turbo models in the last week.)

If we were to speculate on the reason for the error, it might be that while the maximum size of a single input did not exceed 512MB, the error occurred due to exceeding the daily query limit (QPD)??

I just ran a batch job for about 100k embeddings requests.

Took 9 hours and spat out 2GB files.

All content_ids are null.

  1. What

Do you mean perhaps custom_id?

did you populate that input field in your jsonl with your own generated unique values for then matching requests with responses?

I think you would be safe to only send [-a-z0-9_], and maybe not starting with a number (like failing uuid).

1 Like

Yes, of course. The custom_ids were set as alphanumeric strings.

So what’s a “content_id”? What is actually missing?

Did you get the embeddings vector responses in the file?

The content_ID is used in a batch request to link the vector back to its original content. Without it in the output file, the vectors are completely useless.

Sorry, I see the confusion. Yes, I did mean custom_id.

I would send a small batch of embeddings with a custom_id field that is generated very similarly to the examples, such as "request12345" - just adding the number and even leaving out their hyphen.

If you find that your results when it runs still aren’t working, it may be a significant problem with batch API with embeddings, which was not announced but only recently added to the documentation. You can tag this topic’s OpenAI staff member for feedback then.

(and if you sent content_id in the JSONL inputs, that would be a very good reason for it not to work…)

1 Like

Well, right now limit is 90,000 input tokens which is about 35-40 small documents in a batch; and another issue too: I can upload few batches, but I cannot submit few if sum of tokens “in process” (being processed with 2-3 batches concurrently) is higher than 90,000. Then, why is it called “batch”…

Ok, we can live with that.

But sometimes batch of 25 stops at record No. 25, and it stops forever! I am forced to cancel it, and resubmit the same job, and it takes only few minutes to complete.

P.S. I tried to cancel two jobs which got stuck on last record, and it is almost 2 hours already, still being cancelled (but not yet cancelled)… this is called “AI” :wink:

P.P.S. There is discrepancy between “playground” and “batch”; “batch” interacts with “completion” endpoint and I can choose many variants of models; but I cannot reproduce exactly the same API call using playground. So that I am forced to experimmenting with “batch” to tune prompts, I cannot use “playground” for that.

Thank you for making it available! I have amazing results… in many cases I find it is indeed Artificial Intelligence, it invents correct answers (I verified: correct!) - and those answers are not found on Internet (I found few incorrect answers).

1 Like

Now available, requiring the latest Python and node.js openai:

/v1/chat/completions, /v1/embeddings, and /v1/completions are supported.
Note that /v1/embeddings batches are also restricted to a maximum of 50,000
embedding inputs across all requests in the batch.

The API parameter for file ID separately states:

Your input file must be formatted as a [JSONL file](, and must be uploaded with the purpose batch. The file can contain up to 50,000 requests, and can be up to 100 MB in size.

Be aware that there have been failures to return custom ID results on the required custom_id field you generate any more complex than ‘batch1234’.

I came here trying to find out how do we know or see the 50% discount? Running the completion in real time and in batch yields the same amount of tokens used.

It should cost you 50% less for the same amount of tokens that you process via the batch API.

It looks like artificial delay: when I submit JSONL of 10 JSON objects, it processes 9 objects in less than a minute and then stops for few hours at 10/10; when I submit 25, it stops at 25/25; and when I submit really large 672kb file with 99 request JSONs, it stops at 99/99. Please fix it and document properly and consistently; few days ago documentation was sayinng that file size cannot be more than 500k, and now it says 100k, but I was able to submit 670k. If you are frequently changing API and doing Beta-tests, then please provide higher discount because I wasted too much time already.

UPDATE: I tried to submit 3 records: first two are instantly processed in a minute or two; then it stops “forever” at 3/3.

Yesterday I was patient, didn’t cancel any job, and at night time (night!!! off-peak) it took 1-2 hours per 25-records-batch. It was much faster last week, 10-15 minutes per batch, with off-peak hours early morning (3am-4am California time). But not now. I am sure it is either bug, or undocumented artificial delay (obviously system is not busy: 98 records were processed in few minutes!!! Then it stops “forever” on last record) .

1 Like
  • I am wondering what is it; I tried: it doesn’t work! Error message says “invalid endpoint”.

Also, regarding custon_id: I successfully used IDs such as “My_Interesting-Article(about_how_to_manage-AI?!!)” - but it was last week. It worked yesterday too.

I am not sure I understand your concern. The commitment is to have the batch processed within a 24h timeframe. If it gets processed before, then it is of course great but why do you expect that everything gets processed immediately and then cancel it before the 24h have elapsed?

Apologies if I miss a point here but I am genuinely curious to understand this.

Yes, 24 hours is SLA, but what about “250M input tokens” (see “commitment” in the initial message of this thread)?

And I was not able to submit more than 90,000 tokens last week: error message saying that 90,000 is the limit.

Plus, if it is called batch, why I cannot submit 10 files 50,000 tokens each? System tries to process it in parallel and respond with error messages.

Could you pease document exact commitments? Thanks,

Note: I submitted batch of 80 tasks; 79 finished in few minutes, and last one is pending for few hours already. I submitted second batch, 100 tasks, 99 finished in few minutes and last one is pending few hours already. Ask AI what does that mean… but for me it is very obvious: system has necessary resources to process 79+99 records in few minutes, and then it has artificial delay to prevent us to use this API.

Hi, I’m still trying to understand how batchapi works, is each Request id a file that holds several rows with ids, lets think of a table , and the User content holds the question that will be checked for each row in the table?
Let’s say I have an sql database, I need to convert every 100 rows to a request id file?

If it is called “batch”, then please allow client to submit, for example, 100 files, 100,000 tokens each, 2w (two weeks) processing time, 500kb size of a file, and process it serially for each project/client.

Right now, de-facto limit is 90,000 tokens, and if I try to submit few files with 50,000 tokens each and return back next day to check I get error messages just because system tries to process all in parallel (instead of serially). Error messages saying that I try to process 95,000 tokens (just because system started processing File #2 without waiting for File #1 to finish processing).

If it is batch, I need to be able to return back in a week to check my hundred million tokens; splitting hundred million into smaller files is natural requirement, but trying to process batch “in parallel” is inefficient to say the least.

I’d also add that with parallel processing it would be super nice: what if you have a lot resources available to. process all millions tokens (from hundreds files) at some point of time; why to waste resources doing nothing just because someone suggest to make is serial (as a workaround)? But then you need to improve rules: 100,000 tokens per single file, but not for all sum of tokens from all “parallel” submissions.

Hope it helps.