Critical Issues with Batch API - Detailed Report and Observations

Hello,
I’m reaching out publicly after approximately two weeks of extensive testing with the Batch API, investing significant personal time and resources. Despite multiple attempts to resolve these issues through direct support channels, the problems persist, and thus, I am providing a detailed, transparent report here.

Summary of Main Issues:

  1. Billing discrepancies: Costs are unpredictable and do not match actual token usage.
  2. High failure rates: Batch requests frequently have a failure rate up to 90%.

Detailed Results of Latest Experiment:

Experiment Conditions:

  • Start: April 26, 2025, at 04:24
  • Duration: 2 days, 6 hours, 24 minutes, and 17 seconds
  • Total batches sent: 93 (varying from 1 to 100 items per batch)
  • Model used: o4-mini-2025-04-16

Request Statistics:

  • API Responses: Total requests – 885; Successful – 758; Failed – 97.
  • Dashboard / Usage: Reports 1,838 total requests.
  • Conclusion: The Dashboard figures appear significantly inflated and inconsistent with actual API data.

Token Statistics:

  • API Responses: Input – 5,636,680 tokens; Cached – 0; Output – 15,231,993 tokens.
  • Dashboard / Usage: Input – 6,229,000 tokens; Output – 21,323,000 tokens.
  • Conclusion: Even disregarding my involvement in the free token program (up to 10 million tokens for the o4-mini model), Dashboard numbers are unreasonably high.

Billing Statistics:

  • Calculation based on actual token usage: Input – $3.100174; Output – $33.510385; Total – $36.610559.
  • Dashboard / Usage: Input – $3.653; Output – $48.803; Total – $52.46.
  • Billing: Actual balance reduction matches Dashboard usage exactly.
  • Conclusion: Actual charges exceed calculated amounts by approximately 1.5 times.

Observations:

  • The discrepancy in the number of requests is extremely large, even considering failed requests; Dashboard figures exceed real data by more than double.
  • This discrepancy likely affects token count accuracy as well.
  • A key oversight on my part was not thoroughly documenting this issue when initially discovered. Initially, the issue arose with a 10-fold acceleration in fund depletion. Furthermore, there is currently no transparency regarding the usage of my free token allocation, complicating accurate billing verification. Nonetheless, even excluding free tokens entirely, I’ve been charged at least 1.5 times more than what would be accurate based on documented usage.

I kindly ask OpenAI to urgently investigate and address these significant discrepancies.

Thank you for your attention to this critical matter.

New test with new project (for statistic isolation).
New incredible results.

Real token usage: 0 (zero)
Dashboard: 600k in / 2.1M out


1 Like

For pending jobs showing up in billing, I would wonder if its equivalent to “preauthorizing your card”, where your input and max_tokens is evaluated against your credit balance to pay for it.

That’s how it is with fine-tuning - where you needed enough funds to pay for the job even when it was free.

If not holding back funds per job, you could submit thousands of $0.10 batches against your $0.10 balance.

Then, you may need to wait a while for the billing to settle down, just as billing for usage can even show up only the following day.

I am willing to accept and agree to the described mechanics, provided that upon completion of the task, the resulting figures align with actual usage. As noted in my initial post, there is currently a substantial discrepancy between the number of requests submitted and the actual tokens consumed upon completion of all batches.

Batches have completed, and the updated statistics are ready.

Dashboard shows:

  • $22.01 charged
  • 883 requests
  • Token usage: input 6.4M, output 22.5M

Actual usage:

  • $7.85 should have been charged
  • 174 requests
  • Token usage: input 1.3M, output 3.2M

This is nearly three times more than necessary.

Support remains silent.
The forum remains silent.
Either no one is using batches, no one is monitoring their batch budgets, or my case is unique.

=== Batch batch_6810d84de0688190a6c8376dc90f7d17 ===
Requests   Succeeded    Failed                 Start    Duration
----------------------------------------------------------------
100               73        27      2025-04-29 13:46    10:23:49

Model token usage          Calls          In       Cache         Out
----------------------------------------------------------------------
o4-mini-2025-04-16            73      544204           0     1408034

Price            In($)    Cache($)      Out($)    Total($)
----------------------------------------------------------------------
Direct       $0.598624   $0.000000   $6.195350   $6.793974
Batch        $0.299312   $0.000000   $3.097675   $3.396987

=== Batch batch_6810d8122b148190baaf04eebbfa01f6 ===
Requests   Succeeded    Failed                 Start    Duration
----------------------------------------------------------------
100               34        66      2025-04-29 13:45    14:13:16

Model token usage          Calls          In       Cache         Out
----------------------------------------------------------------------
o4-mini-2025-04-16            34      262782           0      567024

Price            In($)    Cache($)      Out($)    Total($)
----------------------------------------------------------------------
Direct       $0.289060   $0.000000   $2.494906   $2.783966
Batch        $0.144530   $0.000000   $1.247453   $1.391983

=== Batch batch_6810d7d9037081908f8eb94c7e2225dd ===
Requests   Succeeded    Failed                 Start    Duration
----------------------------------------------------------------
100               67        33      2025-04-29 13:44    10:25:45

Model token usage          Calls          In       Cache         Out
----------------------------------------------------------------------
o4-mini-2025-04-16            67      523564           0     1263767

Price            In($)    Cache($)      Out($)    Total($)
----------------------------------------------------------------------
Direct       $0.575920   $0.000000   $5.560575   $6.136495
Batch        $0.287960   $0.000000   $2.780287   $3.068248

=== Overall Summary ===
Requests   Succeeded    Failed                 Start    Duration
----------------------------------------------------------------
300              174       126      2025-04-29 13:44    14:14:13

Model token usage          Calls          In       Cache         Out
----------------------------------------------------------------------
o4-mini-2025-04-16           174     1330550           0     3238825

Price            In($)    Cache($)      Out($)    Total($)
----------------------------------------------------------------------
Direct       $1.463605   $0.000000  $14.250830  $15.714435
Batch        $0.731803   $0.000000   $7.125415   $7.857217

Thanks for taking the time to flag this, I have raised it with OpenAI.