Could someone knowledgeable please explain this to me? When using the GPT API, the number of tokens shown in the logs and the actual amount being billed are quite different. Why do you think that is?
For GPT-4.1 mini (input: $0.4 / output: $1.6),
Input: 32,000 tokens = $0.0128
Output: 2,400 tokens = $0.00384
However, it clearly feels like several dollars are being consumed each time.
In the usage on the platform site, you can go to āResponses and Chat Completionsā by clicking above the little graphs, then āgroup by->modelā and see your daily usage of that model, in input and output tokens.
Then back to the main usage, again for the same day period, and choose āgroup by->line itemā, to evaluate only the cost of that model again, in dollars.
One place where you are billed more than you sent is in vision. This model uses āpatchesā, which can be up to 1536 tokens, easily, and then OpenAI multiplies that usage by 1.62x->2489: $0.001 per image, still small, but that can stack up when you run conversations again.
Thank you for explaining. I checked it using the method you suggested, and I found a few interesting things.
First, it seems that gpt-4.1-mini, which appears in the logs, is not actually being used according to the usage data. Instead, the usage shows GPT-5.2 Codex.
So I suspect the model name shown in the logs is incorrect.
Also, the numbers still donāt add up.
I went through the logs, and roughly this is what it looks like:
I also took a rough look at both input and output tokens. While 1,000 tokens is on the higher side, there are quite a lot of requests with around 700 tokens, or even 100 tokens.
So itās fair to say that the token count above is actually on the high side.
But the billed amount is completely different, right?
What could be causing this?
Iām using gpt-5.2-codex for coding, so I donāt think itās wrong that Iām being billed for gpt-5.2-codex. I also think the gpt-4.1-mini entry shown in the logs is just a display bug.
What I canāt understand is that, as shown in the calculation above, even assuming the usage is entirely gpt-5.2-codex, the numbers donāt come close to adding up.
Note: I see a $0 for gpt-4.1 mini on that day you show the calls, at the bottom of the graph in the legend where there are values shown. You getting billed less than the API calls you are making.
Are you enrolled in āShare inputs and outputs with OpenAIā for training, to get daily complementary tokens on chat models? That could be why you have the opposite of what you report - essentially no bill for gpt-4.1-mini on 2026-02-06.
If you want your own retrieval of costs, in buckets by day, you can create an ADMIN key, to use the organization admin endpoints.
Hereās my own āgetting startedā script for you, Iāve shared both help for ācostsā and āusageā retrieval (essentially helping other API users answer similar mysteries):
More importantly, as shown in the screenshot, the logs indicate that gpt-5.2-codex was never used at all.
In my own code, the model I explicitly specify for coding is gpt-5.2-codex. However, for some reason, the logs show that gpt-4.1-mini was used instead, while the usage/billing is charged to gpt-5.2-codex.
Looking at the log contents themselves, the commands I gave to the LLM clearly correspond to the entries labeled as gpt-4.1-mini.
Because of this, I believe that the gpt-4.1-mini label in the logs is a bug, and that the model actually used is gpt-5.2-codex.
And even assuming that gpt-5.2-codex is indeed being used, as mentioned earlier, the discrepancy in the billed amount is far too large.
Nothing is getting added to logs right now in a timely fashion, as expected with this flaky log service that takes constant āitās broke againā prodding, so I cannot report on the quality of model ID reporting now. I called with gpt-5.2-codex and gpt-5.2 with deliberate āstoreā:ātrueāā¦to see no evidence.
You can see by my previous āstoreā, for posterity, retaining how much they violate the documentationās ādefault 30 daysā by it being 10 months ago, I avoid this.
More
hours later - still nothing new in Responsesā logsā¦