Batch API result file download fails silently for large batches (>~200MB output)

  1. Symptom: Submitted 50,000 embedding requests using the Batch API, resulting in an output file of approximately 1GB. The download fails with the following error: peer closed connection without sending complete message body (received X bytes, expected Y bytes).
  2. Conditions to Reproduce: The number of batch lines approaches the 50,000 limit, and the output file size exceeds ~200-300MB.
  3. Documentation Gap: The official documentation does not mention any impact of output file size on download stability.
  4. Workaround: Limit each batch to under 10,000 lines (keeping the output around ~200MB) to ensure stable downloads.
  5. Environment: openai-python SDK, macOS. The issue is reproducible using both with_streaming_response.content() and the standard requests library.

Not sure if only me met this issue…which i didnt see in the developer docs.

thank you in advanced.

1 Like

Hi and welcome to the community!

Have you tried to download the batch via the platform? This should get you unstuck for now and you can retrieve your embeddings.

https://platform.openai.com/batches

1 Like