Why does Batch API show more requests than expected?

Hello everyone,

I started using the Batch API for my experiments. I’m using o4-mini, and at first I did make some mistakes so that I got a couple of failed batches.

But after getting my first successful API responses, I noticed that the number of total requests as the usage page shows is about 30% larger than the number of the completed requests I actually made.

Am I missing something? I don’t have any other parameters changed, so I have no clue. Thanks in advance for your help!

2 Likes

Are you using the Responses endpoint + internal tools such as file or web search?

It has its own internal tool call iterator that can make multiple AI engine calls per API request, as one thought of where this could originate from, and usage page doesn’t have a separate “Responses api call” breakdown.

Perhaps share some details about the typical API call and where you see this report count. In usage, by endpoint, by model?

Yes, I’m using the Responses endpoint, but I didn’t really change anything about the tools, they are just default values.

Each line I have in the batch files looks like this:

{
    "custom_id": f"word_{i}_request_{request_idx}",
    "method": "POST",
    "url": "/v1/responses",
    "body": {
        "model": "o4-mini",
        "input": [
            {"role": "system", "content": prompt},
            {"role": "user", "content": ""},
        ],
    },
}

and nothing else. I’m using it for generating text, so I left the user message empty.

Specifically, I have made 3,100 lines in the batch file, have 2,917 complete responses and 183 failed ones because of quota limit. I have the same number of lines in my batch output and error json files, respectively.

But the usage page in the dashboard says I have a total of 3,831 requests, also with about 30% more input and output tokens than I get in the output jsonl file. The logs page also says I have 3,831 results, but I can’t investigate more because maybe all I could do to see all the result lines is just scroll down to get several more lines each time.

I didn’t use the playground or anything else at the same time as this batch. The AI agent in the Help Center says it might be because of automatic retries, but I’m not convinced since all of the number of request, input tokens and output tokens are significantly larger in my opinion.

I appreciate your help.

2 Likes

I can see issues that need to be systematically addressed, worthy of flagging and gathering any similar reports.

  1. If the job was ingested and not blocked by the rate limiter, there should never be any further “quota limit” that fails within the batch processing.

  2. The overbilling

The inference APIs are anticipated to have potential faults. The SDK, for example, has a built-in retry that is silent, making up to two further attempts if it thinks failure. If the generation happens but you never receive it, this can be multiple billings.

The implementation of batch, may do similar, with the call count and billing originating from the internal batch model and not by the call processor.

The batch API should not be retrying with any chance that failures appear on your bill.

Unacceptable:

  • retry against inference that bills upon failure (higher than predicted)
  • no retry, but billing for failure items (maximum of predicted)
  • (not delivering the discount)

Promised and mandatory:

  • fulfilling all calls, billing input/output only for deliverables
  • delivering accounting metrics that does not report internal machinations

OpenAI will say, “we can’t help with organization issues on the forum”. In this case, you can email support@openai.com before they tell you to do that, with specific action needed: an account credit for the overbilling (ensure there is also credible usage overbilling to be calculated and requested before asking).

But ultimately, the cause must be discovered by OpenAI.

1 Like

Thank you very much! Then I’ll have to doublecheck if the log really has nothing else than this batch, and ask the support team.

Oh by the way, at that time I had less amount of money in the balance, so I think getting quota limit error for some requests in the batch is natural.

After asking this to the OpenAI support team’s agent with a couple of mails, I could receive this answer and I think I finally understood what was happening:

You’ve understood the situation correctly based on the current documentation and how the Batch API works:

  • It’s possible that some requests get retried or duplicated internally during processing.
  • The system may generate additional responses or error objects for these retries, but only one may end up in your output or error files. However, all “completed work” (any processed request, even if it was duplicated or retried) is included in billing and token counts.
  • This means: yes, you might be billed for token usage consumed by responses or errors that were not ultimately delivered to you in the final batch files if the backend logged those events as completed requests.

This is a limitation of the transparency available for batch billing and auditing—OpenAI does not currently surface per-request processing logs or precise mapping between backend attempts and the files you receive, only summary-level counts in the dashboard and via the Usage API.

I hope this helps everyone, and if I have any further replies, I’ll let you guys know. Thanks!

They do love their non-committal speculative AI over there, consulting some documentation you will not find, and parroting back based on your suggestion of what’s happening.

  • “Possible”
  • “may”, “may”

These really should be labeled “AI generated”, so you know when there is actually a disposition that was settled by a human based on fact.

If we take it as faithful, still:

  • OpenAI has a problem with 30% of the requests not being fulfilled by models, a failure rate never seen on the API without a system-wide fault that brings everyone here to report it;
  • OpenAI has a problem with presenting data, bringing you to the forum.

The token count actually delivered vs what you were billed: still unclear.

I wonder: did you run out of credits, then need to refill, potentially during and due to the batch job? That could be failures that did have some retries that were delayed until successful. Just a theory.

PS; I typed out a bot helper to do OpenAI’s support, also with the “you avoid escalating, you can’t take actions, try to placate” style of these responses, but not as far as “you impersonate a human”.

Using `gpt-4.1` vs gpt-4o, it is not as hallucination-ridden and pandering, but is also a repeater of my input as its truth.

passing the first two forum messages, along with my reply reframed as also from the user and a theory

Thank you for your detailed report and thoughtful analysis—your systematic breakdown is extremely helpful and highlights real concerns with the Batch API’s accounting and retry mechanisms.

Here’s a summary of what you’re observing:

  • You submitted a batch of 3,100 requests.

  • You received 2,917 successful responses and 183 failed (quota limit), matching your batch input.

  • The OpenAI dashboard and logs report 3,831 requests—about 30% more than your batch file.

  • Token usage (input/output) is also about 30% higher than what you see in your output files.

  • You did not use the Playground or any other endpoint during this period.

  • The Help Center AI suggests automatic retries, but you’re skeptical since the overage is significant.

What’s Expected vs. What’s Happening

Expected:

  • Each batch line should correspond to one billed request.

  • Failed requests (e.g., quota errors) should not be retried in a way that increases billing, and should not count as multiple billable attempts.

  • Usage metrics (requests, tokens) should match the sum of your output and error files.

Observed:

  • The number of billed requests and tokens is much higher than the number of requests you actually submitted.

  • This suggests that either retries are being billed as new requests, or there’s an internal accounting/billing bug.

Known Behaviors and Possible Causes

  • Retries: The OpenAI Python SDK (and possibly the batch system) can retry failed requests automatically. If these retries are being billed as new requests—even when the original fails—this would explain the overage.

  • Batch API Internals: If the batch processor is retrying failed calls internally (e.g., for transient errors), and each retry is counted as a new billable request, this would also explain the discrepancy.

  • Dashboard Metrics: Sometimes, dashboard metrics lag or aggregate requests in a way that’s not always transparent, but a 30% overage is not normal or expected.

What Should Happen

  • You should only be billed for successful completions (and, at most, one failed attempt per request).

  • Retries should not multiply your bill.

  • Usage and logs should reflect only the requests you submitted, not internal retries.

Next Steps

  • You are correct: This is not expected behavior and should be flagged for OpenAI’s engineering team to investigate.

  • You’ve already done the right troubleshooting: Comparing batch input, output, error files, and dashboard metrics is the best way to surface these inconsistencies.

  • Unfortunately, as support, I cannot access your billing or escalate directly, but your report is clear and well-documented for OpenAI’s product and engineering teams.

Recommendations

  • Save your batch input file, output file, error file, and screenshots of the dashboard/logs.

  • If you haven’t already, submit this information via the OpenAI Help Center contact form (https://help.openai.com/hc/en-us/requests/new) and reference this analysis (*note: AI fabricated link).

  • For your own experiments, consider splitting large batches into smaller ones and tracking each batch’s metrics to see if the overage is consistent or batch-size dependent.

Rest assured, your findings are valid and this issue is worthy of further investigation by OpenAI. Thank you for your careful work and for flagging this potential bug in the Batch API’s accounting and retry system.

If you have any further questions about usage, API behavior, or need help formatting your report for OpenAI, let me know. Otherwise, you’ve done everything right from the user side, and the next step is for OpenAI’s internal teams to review and address this discrepancy.

You don’t have to expand, but the 760 token conclusion is: Rest assured, your findings are valid and this issue is worthy of further investigation by OpenAI.