OpenAI File Size Limit and Batch API result files

I am using Batch API to embed large amount of documents. OpenAI has a total file size limit per organization of 100 GB, which affects the amount of data I can upload at any time for processing them.

How about the result files? Do their size count towards this limit? Can I simply upload 100GB of documents for embedding and still be able to retrieve the processed batch (which is 4 times the uploaded document).

1 Like

Welcome to the Forum!

There’s a couple of points to be mindful of here:

  1. You would not upload the actual documents but instead you would create the JSONL file with the chunks of text from the documents for embedding. The chunks must be within the token limits of the embedding model you are looking to use.

  2. Furthermore, the following constraint is in place for batches:

The file can contain up to 50,000 requests, and can be up to 100 MB in size.

Source: https://platform.openai.com/docs/api-reference/batch

1 Like

I just submitted a bunch of files with 50 k requests but didn’t notice the batch size limit until now. The jsonl files are larger than 100 MB. I’ve gotten a couple small error files, but most of it has processed. Can I assume that the requests that weren’t in the error file have been processed correctly? Do I need to run the whole job again?

1 Like

Hi there and welcome to the Forum!

Your best course of action is to retrieve and inspect the batch results. You can do that by first retrieving the batch and then by making a request for the output file with the results based on the output file ID. You can find directions for these steps here and here.

3 Likes