I’m really confused about using the OpenAI Batch API (GPT-4o-mini). I used about 26,898 input tokens and 24,702 output tokens, with num_requests: 23, major_in_cost: 0.00942795, and cost: 0.942795, request_type: batch, usage_type: text.
Is cost the total price? Why is it higher than the price estimates? The Batch API pricing for GPT-4o-mini is:
$0.075 per 1M input tokens
$0.300 per 1M output tokens
Are there any other hidden costs? The price should be much lower than 0.9 USD based on these rates. I don’t understand. Can someone please help?
It looks like you’re using GPT-4o-mini via the Batch API, which offers a 50% discount compared to synchronous APIs. From the token usage you mentioned (26,898 input tokens and 24,702 output tokens), the calculated cost should be:
So, the total cost should be around 0.00943 USD. However, you’re seeing a cost of 0.942795 USD, which is significantly higher.
Here are a few possibilities that might explain the discrepancy:
Batch overhead or minimum cost: There could be an overhead cost for using the Batch API or a minimum fee for processing jobs, even if token usage is relatively small.
Rate limits or batching quirks: The Batch API is designed for high-throughput tasks with separate rate limits and discounts, but if multiple requests or large batches are processed, it might contribute to additional costs depending on how your requests are structured.
Usage discrepancy: Sometimes, the usage dashboard might show discrepancies or rounding issues, so it’s worth checking in with OpenAI support to get a detailed breakdown of your charges.
Given that the Batch API is tailored for scenarios like embedding large datasets or running evaluations where timing isn’t critical, it’s a great cost-saver just be mindful of how requests are bundled to avoid unexpected charges.
It might be helpful to check your request structure and perhaps reach out to OpenAI to clarify if there’s an additional cost factor you’re missing. Let us know how it goes!
@darcschnider Thank you for your quick response. The major_in_cost of 0.00942795 displayed on the dashboard matches the calculation. However, the cost of 0.942795 is exactly 10 times higher. Is this just a coincidence or some kind of minimum price? What could be the difference between major_in_cost and cost? I’ve searched everywhere but couldn’t find any explanation for what this means.
yeah I could not see a min price either it was only an opinion that there could be. Are you sure your using the batch call as mentioned in the link which is not instant can take up to 24 hours for responses. If you are not using that specific code call it would be that you are just batching your requests through the standard api call which is not the same thing at all and no discount.
So check that first and if that is not the case than you best contact support. If you do please follow up a post so we know thanks
I’m encountering a similar issue when using the GPT-4o with the batch API. According to the OpenAI usage platform, I used a total of 1 million tokens, which should cost $1.25. However, I was charged $2.30, which is closer to the pricing for the synchronous API. I would appreciate any clarification on this matter.
API pricing for gpt-4o batch processing:
$1.25 / 1M input tokens; $5.00 / 1M output tokens
Amount I was billed: $2.30 / 1M total tokens
Help me understand why the price of services is not $1.25.
Calculate the ratio of input to output that would give that pricing.
ChatGPT:
To determine the ratio of input tokens to output tokens that results in a combined price of 2.30 per 1 million total tokens, we can set up an equation using the given prices:
Input tokens cost 1.25 per 1 million.
Output tokens cost 5.00 per 1 million.
Let x be the proportion of input tokens and y be the proportion of output tokens. We know that x + y = 1 since they represent proportions of the total tokens.
The combined cost per 1 million tokens is given by:
[1.25x + 5.00y = 2.30]
And we have the constraint:
[x + y = 1]
Let’s solve this system to find x and y.
The solution indicates that for the pricing to be 2.30 per 1 million total tokens:
x (proportion of input tokens) = 0.72, or 72\%
y (proportion of output tokens) = 0.28, or 28\%
This means that 72\% of the tokens are input tokens and 28\% are output tokens.