TL;DR: I built a typed Python wrapper around the OpenAI Batch API that handles the annoying parts with Batch API calls — JSONL prep/upload, job submission, polling, result download, and mapping outputs back to your task IDs.
Package: openai-batch-helper
Install: pip install openai-batch-helper
Hi folks — sharing a small Python package I built to make the OpenAI Batch API much easier to use in real projects.
I’m a data scientist by day, and I also tinker with the OpenAI API in my spare time. Recently I was prototyping an idea that required running a large number of requests offline (I ran it on a batch of ~5000+ tasks overnight, processing some old family photos), so the Batch API fit perfectly.
But once I started using it, I hit a few pain points pretty quickly:
-
Managing JSONL input/output files manually
-
Uploading files, submitting jobs, then polling until completion
-
Downloading results as JSONL again, then parsing and mapping responses back to my original tasks
-
Generally: a workflow that’s easy to get messy when you’re trying to keep things typed and production-friendly
So I built openai-batch-helper to keep the batch flow tidy and ergonomic.
One fun/interesting note: most of this package was built via Codex + prompt engineering. I mainly guided the implementation, reviewed the outputs, and then added a handful of comments and small modifications to get it over the finish line. Also, just a word of caution:
Batch can save you ~50%, but running hundreds of requests in Batch can still drain your credits way faster than realtime calls. Ask me how I know. And please don’t tell my wife
)
Here’s the gist (quickstart):
from openai_batch_helper import BatchHelper, status_progress_logger
# assuming that you have OPENAI_API_KEY in environmental variables. see documentation for more ways to initialize
helper = BatchHelper(
endpoint="/v1/chat/completions",
completion_window="24h")
job = helper.init_job()
job.add_task(
"t1",
body={
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "Be concise."},
{"role": "user", "content": "Explain idempotency in one sentence."},
],
},
)
job.add_task(
"t2",
body={
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "Be concise."},
{"role": "user", "content": "List 3 benefits of unit tests."},
],
},
)
(job
.submit_file()
.submit_batch_job(metadata={"project": "demo"})
.wait_for_completion(poll_seconds=5.0, on_update=status_progress_logger()))
print(job.download_result())
print(job.map_by_custom_id())
Hope this is useful — comments/suggestions/PRs welcome!