The issue where the token usage fees for GPT-4.1 Mini and the actual amount billed are wildly inconsistent

Could someone knowledgeable please explain this to me? When using the GPT API, the number of tokens shown in the logs and the actual amount being billed are quite different. Why do you think that is?

For GPT-4.1 mini (input: $0.4 / output: $1.6),
Input: 32,000 tokens = $0.0128
Output: 2,400 tokens = $0.00384

However, it clearly feels like several dollars are being consumed each time.

1 Like

I’m curious where the ā€œfeelingā€ arises?

In the usage on the platform site, you can go to ā€œResponses and Chat Completionsā€ by clicking above the little graphs, then ā€œgroup by->modelā€ and see your daily usage of that model, in input and output tokens.

Then back to the main usage, again for the same day period, and choose ā€œgroup by->line itemā€, to evaluate only the cost of that model again, in dollars.

One place where you are billed more than you sent is in vision. This model uses ā€œpatchesā€, which can be up to 1536 tokens, easily, and then OpenAI multiplies that usage by 1.62x->2489: $0.001 per image, still small, but that can stack up when you run conversations again.

1 Like

Thank you for explaining. I checked it using the method you suggested, and I found a few interesting things.

First, it seems that gpt-4.1-mini, which appears in the logs, is not actually being used according to the usage data. Instead, the usage shows GPT-5.2 Codex.
So I suspect the model name shown in the logs is incorrect.

Also, the numbers still don’t add up.

I went through the logs, and roughly this is what it looks like:

For GPT-5.2 Codex (input: $1.75 / output: $14.0)

Input: 1,000 Ɨ 847 tokens = 847,000 tokens = $1.48225
Output: 26 Ɨ 847 tokens = 22,022 tokens = $0.308308

Total: $1.79

I also took a rough look at both input and output tokens. While 1,000 tokens is on the higher side, there are quite a lot of requests with around 700 tokens, or even 100 tokens.
So it’s fair to say that the token count above is actually on the high side.

But the billed amount is completely different, right?
What could be causing this?

If you are making no use of the gpt-5.2-codex model…then somebody else is, apparently to the tune of $100/day.

Revoke all your API keys across all projects.

Find out where you are leaking API keys where others can discover and abuse your API account.

https://platform.openai.com/docs/guides/production-best-practices#api-keys

The use of models can be with conversations self-managed and ā€œstoreā€:ā€œfalseā€, which won’t show in logs of API calls.

I’m using gpt-5.2-codex for coding, so I don’t think it’s wrong that I’m being billed for gpt-5.2-codex. I also think the gpt-4.1-mini entry shown in the logs is just a display bug.

What I can’t understand is that, as shown in the calculation above, even assuming the usage is entirely gpt-5.2-codex, the numbers don’t come close to adding up.

Note: I see a $0 for gpt-4.1 mini on that day you show the calls, at the bottom of the graph in the legend where there are values shown. You getting billed less than the API calls you are making.

Are you enrolled in ā€œShare inputs and outputs with OpenAIā€ for training, to get daily complementary tokens on chat models? That could be why you have the opposite of what you report - essentially no bill for gpt-4.1-mini on 2026-02-06.

https://platform.openai.com/settings/organization/data-controls/sharing

If you want your own retrieval of costs, in buckets by day, you can create an ADMIN key, to use the organization admin endpoints.

Here’s my own ā€œgetting startedā€ script for you, I’ve shared both help for ā€œcostsā€ and ā€œusageā€ retrieval (essentially helping other API users answer similar mysteries):

1 Like

I’m not receiving any free tokens.

More importantly, as shown in the screenshot, the logs indicate that gpt-5.2-codex was never used at all.
In my own code, the model I explicitly specify for coding is gpt-5.2-codex. However, for some reason, the logs show that gpt-4.1-mini was used instead, while the usage/billing is charged to gpt-5.2-codex.

Looking at the log contents themselves, the commands I gave to the LLM clearly correspond to the entries labeled as gpt-4.1-mini.
Because of this, I believe that the gpt-4.1-mini label in the logs is a bug, and that the model actually used is gpt-5.2-codex.

And even assuming that gpt-5.2-codex is indeed being used, as mentioned earlier, the discrepancy in the billed amount is far too large.

Note this quirk: the times shown in the log list items are unexpectedly normalized there to local time.

The API usage however is delineated at 00:00 UTC.

That means your true daily usage can go across days you read directly out of the listing without clicking further to see a call’s time offset.

This is how gpt-5.2-codex shows up:

Nothing is getting added to logs right now in a timely fashion, as expected with this flaky log service that takes constant ā€œit’s broke againā€ prodding, so I cannot report on the quality of model ID reporting now. I called with gpt-5.2-codex and gpt-5.2 with deliberate ā€œstoreā€:ā€œtrueā€ā€¦to see no evidence.

You can see by my previous ā€œstoreā€, for posterity, retaining how much they violate the documentation’s ā€œdefault 30 daysā€ by it being 10 months ago, I avoid this.

More

hours later - still nothing new in Responses’ logs…

1 Like