Hello,
I’m reaching out publicly after approximately two weeks of extensive testing with the Batch API, investing significant personal time and resources. Despite multiple attempts to resolve these issues through direct support channels, the problems persist, and thus, I am providing a detailed, transparent report here.
Summary of Main Issues:
Billing discrepancies: Costs are unpredictable and do not match actual token usage.
High failure rates: Batch requests frequently have a failure rate up to 90%.
Detailed Results of Latest Experiment:
Experiment Conditions:
Start: April 26, 2025, at 04:24
Duration: 2 days, 6 hours, 24 minutes, and 17 seconds
Total batches sent: 93 (varying from 1 to 100 items per batch)
Model used: o4-mini-2025-04-16
Request Statistics:
API Responses: Total requests – 885; Successful – 758; Failed – 97.
Dashboard / Usage: Reports 1,838 total requests.
Conclusion: The Dashboard figures appear significantly inflated and inconsistent with actual API data.
Conclusion: Even disregarding my involvement in the free token program (up to 10 million tokens for the o4-mini model), Dashboard numbers are unreasonably high.
Billing Statistics:
Calculation based on actual token usage: Input – $3.100174; Output – $33.510385; Total – $36.610559.
Billing: Actual balance reduction matches Dashboard usage exactly.
Conclusion: Actual charges exceed calculated amounts by approximately 1.5 times.
Observations:
The discrepancy in the number of requests is extremely large, even considering failed requests; Dashboard figures exceed real data by more than double.
This discrepancy likely affects token count accuracy as well.
A key oversight on my part was not thoroughly documenting this issue when initially discovered. Initially, the issue arose with a 10-fold acceleration in fund depletion. Furthermore, there is currently no transparency regarding the usage of my free token allocation, complicating accurate billing verification. Nonetheless, even excluding free tokens entirely, I’ve been charged at least 1.5 times more than what would be accurate based on documented usage.
I kindly ask OpenAI to urgently investigate and address these significant discrepancies.
Thank you for your attention to this critical matter.
For pending jobs showing up in billing, I would wonder if its equivalent to “preauthorizing your card”, where your input and max_tokens is evaluated against your credit balance to pay for it.
That’s how it is with fine-tuning - where you needed enough funds to pay for the job even when it was free.
If not holding back funds per job, you could submit thousands of $0.10 batches against your $0.10 balance.
Then, you may need to wait a while for the billing to settle down, just as billing for usage can even show up only the following day.
I am willing to accept and agree to the described mechanics, provided that upon completion of the task, the resulting figures align with actual usage. As noted in my initial post, there is currently a substantial discrepancy between the number of requests submitted and the actual tokens consumed upon completion of all batches.