OpenAI Batch Helper: making Batch API jobs painless

TL;DR: I built a typed Python wrapper around the OpenAI Batch API that handles the annoying parts with Batch API calls — JSONL prep/upload, job submission, polling, result download, and mapping outputs back to your task IDs.

Package: openai-batch-helper
Install: pip install openai-batch-helper

Hi folks — sharing a small Python package I built to make the OpenAI Batch API much easier to use in real projects.

I’m a data scientist by day, and I also tinker with the OpenAI API in my spare time. Recently I was prototyping an idea that required running a large number of requests offline (I ran it on a batch of ~5000+ tasks overnight, processing some old family photos), so the Batch API fit perfectly.

But once I started using it, I hit a few pain points pretty quickly:

  • Managing JSONL input/output files manually

  • Uploading files, submitting jobs, then polling until completion

  • Downloading results as JSONL again, then parsing and mapping responses back to my original tasks

  • Generally: a workflow that’s easy to get messy when you’re trying to keep things typed and production-friendly

So I built openai-batch-helper to keep the batch flow tidy and ergonomic.

One fun/interesting note: most of this package was built via Codex + prompt engineering. I mainly guided the implementation, reviewed the outputs, and then added a handful of comments and small modifications to get it over the finish line. Also, just a word of caution:

Batch can save you ~50%, but running hundreds of requests in Batch can still drain your credits way faster than realtime calls. Ask me how I know. And please don’t tell my wife :grinning_face_with_smiling_eyes:)

Here’s the gist (quickstart):

from openai_batch_helper import BatchHelper, status_progress_logger

# assuming that you have OPENAI_API_KEY in environmental variables. see documentation for more ways to initialize

helper = BatchHelper(
    endpoint="/v1/chat/completions", 
    completion_window="24h")
job = helper.init_job()

job.add_task(
    "t1",
    body={
        "model": "gpt-4o-mini",
        "messages": [
            {"role": "system", "content": "Be concise."},
            {"role": "user", "content": "Explain idempotency in one sentence."},
        ],
    },
)
job.add_task(
    "t2",
    body={
        "model": "gpt-4o-mini",
        "messages": [
            {"role": "system", "content": "Be concise."},
            {"role": "user", "content": "List 3 benefits of unit tests."},
        ],
    },
)

(job
 .submit_file()
 .submit_batch_job(metadata={"project": "demo"})
 .wait_for_completion(poll_seconds=5.0, on_update=status_progress_logger()))

print(job.download_result())
print(job.map_by_custom_id())

Hope this is useful — comments/suggestions/PRs welcome!

1 Like