Files API returns 504 when trying to download large content

I submitted batch embedding jobs within the limits mentioned in the documentation. Every job completed successfully and I can see that the files are available:

{
    "object":"file",
    "id":"file-...",
    "purpose":"batch_output",
    "filename":"batch_..._output.jsonl",
    "bytes":2116946514,
    ...
    "status":"processed",
    "status_details":null
}

However, when I try to download the file, I receive a 504 Gateway Timeout error after 1 minute:

curl -i  https://api.openai.com/v1/files/file-.../content -H "Authorization: Bearer $OPENAI_API_KEY" > file.jsonl

HTTP/2 504 
...

error code: 504

Same happens using the openai npm package.

I couldn’t find relevant limits mentioned in the API reference. Is there a file size limit for downloading batch output files, or is there a recommended approach for retrieving large outputs?

I’ve got to wonder if that file size isn’t a bug. Sparse database gone nuts.

math.log(2_116_946_514, 2)
30.979337673184506

2GB is in the “mail me a flash drive” territory. Is that even in the range of possibility, that you were billed for perhaps 500M of tokens if it were language? One embeddings is at most 12k, and the cutoff should be 50000 embeddings per batch, across all the calls and their lists.

We are processing large volumes of legal text and the specific batch contained 50k input texts. I have anticipated larger output sizes since we maxed out the limit, but not 2.1 GB (although there must be a math behind it).