Responses API: STILL overbilling on image inputs (BUG/ISSUE)

_j · December 5, 2025, 9:18pm

Issue: detail setting is not being respected on the Responses API for the majority of models.

I’ve just run through ALL the “vision” API models that accept detail: low.

While Chat Completions is correct, the Responses API continues to overbill and not deliver low detail.

“detail”: “low”

model	chat completions	responses
gpt-5.1-codex-max	—	353
gpt-5.1-codex	—	353
gpt-5.1-codex-mini	—	272
gpt-5.1-2025-11-13	70	353
gpt-5.1-chat-latest	70	353
gpt-5-pro-2025-10-06	—	353
gpt-5-codex	—	353
gpt-5-2025-08-07	70	353
gpt-5-mini-2025-08-07	273	272
gpt-5-nano-2025-08-07	272	272
gpt-5-chat-latest	70	70
o3-pro-2025-06-10	—	85
o3-2025-04-16	75	75
o4-mini-2025-04-16	272	272
o1-pro-2025-03-19	—	75
o1-2024-12-17	75	75
gpt-4.1-2025-04-14	85	368
gpt-4.1-mini-2025-04-14	272	272
gpt-4.1-nano-2025-04-14	272	272
gpt-4o-2024-11-20	85	85
gpt-4o-2024-08-06	85	85
gpt-4o-mini-2024-07-18	85	85
gpt-4o-2024-05-13	85	85
gpt-4-turbo-2024-04-09	85	85

This is on a two-tile input image, a 528x512 WebP, (2 tiles on detail high, 272 1-token patches x multiplier). The over-billing is exactly that of detail==high:

We can contrast to actually requesting high:

“detail”: “high”

model	chat completions	responses
gpt-5.1-codex-max	—	350
gpt-5.1-codex	—	350
gpt-5.1-codex-mini	—	272
gpt-5.1-2025-11-13	350	350
gpt-5.1-chat-latest	350	350
gpt-5-pro-2025-10-06	—	350
gpt-5-codex	—	350
gpt-5-2025-08-07	350	350
gpt-5-mini-2025-08-07	273	272
gpt-5-nano-2025-08-07	272	272
gpt-5-chat-latest	350	350
o3-pro-2025-06-10	—	425
o3-2025-04-16	375	375
o4-mini-2025-04-16	272	272
o1-pro-2025-03-19	—	375
o1-2024-12-17	375	375
gpt-4.1-2025-04-14	425	425
gpt-4.1-mini-2025-04-14	272	272
gpt-4.1-nano-2025-04-14	272	272
gpt-4o-2024-11-20	425	425
gpt-4o-2024-08-06	425	425
gpt-4o-mini-2024-07-18	425	425
gpt-4o-2024-05-13	425	425
gpt-4-turbo-2024-04-09	425	425

(models with dashes are those pointlessly gated on Chat Completions, that could be using my own “code patch” function at my peril, just as I can drop encrypted reasoning in Responses self-management).

The scripting that I coded to reshape requests for tolerance against all models and endpoints also reverses the “cost multiplier” of mini and nano “patches” models and gpt-4o-mini back to the underlying input tokens with best effort, and classifies the overhead differences between models.

O3-pro is also overbilling vs documentation that states “o3”==75 token tiles, especially painful given the extreme cost.

This has been going on for months and months, and even applies to brand new models. As I stated before, with inaction by OpenAI after support posting in the same topic - this must be repaired - or mitigated. Deliver on the promises of documentation and the pricing page!

Further to note is that documentation is inaccurate. Specify image input detail level gives a particular 85 token cost as fact, instead the truth that it is a a per-model value that varies from 65 to 85 tokens - or delivers no savings at all on some:

You can save tokens and speed up responses by using "detail": "low". This lets the model process the image with a budget of 85 tokens.

That section and later makes no clear statement that “patches” vision models such as gpt-5-mini or gpt-4.1-nano will accept low detail as an input parameter, but deliver a full vision product regardless.

sps · December 6, 2025, 7:59pm

Thank you very much for taking the time to report this issue, @_j.

I was able to reproduce this on my end and have forwarded it to the team at OpenAI.

_j · December 6, 2025, 8:07pm

What running every model OpenAI has looks like…

Especially peculiar is that the parameter DOES make a difference - you get billed more with detail:low than without on affected models.

OpenAI_Support · December 15, 2025, 10:08am

Hey, Our engineering team has just deployed a fix. This issue should now be resolved. Thank you!

_j · December 15, 2025, 8:48pm

Here is the current state of requesting low detail on that two-tile-size image that would otherwise use 1+4 base tiles instead of 1. Generally fixed; some remaining expectations not met:

model	chat completions	responses
gpt-5.2-2025-12-11	273	327
gpt-5.2-pro-2025-12-11	—	327
gpt-5.2-chat-latest	292	327
gpt-5.1-codex-max	—	70
gpt-5.1-codex	—	70
gpt-5.1-codex-mini	—	272
gpt-5.1-2025-11-13	70	70
gpt-5.1-chat-latest	70	70
gpt-5-pro-2025-10-06	—	70
gpt-5-codex	—	70
gpt-5-2025-08-07	70	70
gpt-5-mini-2025-08-07	273	272
gpt-5-nano-2025-08-07	272	272
gpt-5-chat-latest	70	70
o3-pro-2025-06-10	—	85
o3-2025-04-16	75	75
o4-mini-2025-04-16	272	272
o1-pro-2025-03-19	—	75
o1-2024-12-17	75	75
gpt-4.1-2025-04-14	85	85
gpt-4.1-mini-2025-04-14	272	272
gpt-4.1-nano-2025-04-14	272	272
gpt-4o-2024-11-20	85	85
gpt-4o-2024-08-06	85	85
gpt-4o-mini-2024-07-18	85	85
gpt-4o-2024-05-13	85	85
gpt-4-turbo-2024-04-09	85	85

Remaining billing anomalies

Overbilling: gpt-5.2 sees a transition to “patches”, where Chat Completions sees the correct expected patches count. On Responses, however, it appears the “mini” billing multiplier of 1.2x is still being applied, despite this being a full-price (and even higher price) model;
Overbilling: gpt-5.2-chat has anomalous extra billing per image on Chat Completions, which cannot be reconciled to any extra patches column or row. It is not overhead, it is per image.
Overbilling: o3-pro is “o3” - why billing 85 instead of 75 still?
Overbilling: a single extra token overhead on just CC with gpt-5.2 and gpt-5-mini? Not an error in my multiplier deobfuscation code, as Responses returns the exact expectation.

Anomaly validation of per-image overbilling

Running ten image inputs in a user message verifies these are per-image billing issues

model	chat completions	responses
gpt-5.2-2025-12-11	2730	3270
gpt-5.2-chat-latest	2920	3270
gpt-5-mini-2025-08-07	2730	2720

Detailed usage log of anomalous model calls

— Testing gpt-5.2-2025-12-11 (Chat Completions)
gpt-5.2-2025-12-11 Image usage: 2730 Image prompt tokens: 2730 Total Usage: 2737
pong

input tokens: 2737	output tokens: 4
uncached: 177	non-reasoning: 4
cached: 2560	reasoning: 0

— Testing gpt-5.2-2025-12-11 (Responses)
gpt-5.2-2025-12-11 Image usage: 3270 Image prompt tokens: 3270 Total Usage: 3277
pong

(Also, the image you attached appears to be a solid magenta block with no visible details.)

input tokens: 3277	output tokens: 26
uncached: 717	non-reasoning: 26
cached: 2560	reasoning: 0

— Testing gpt-5.2-chat-latest (Chat Completions)
gpt-5.2-chat-latest Image usage: 2920 Image prompt tokens: 2920 Total Usage: 2927
pong
I’m here.

(Also, the image you sent appears to be a solid brigh

input tokens: 2927	output tokens: 34
uncached: 239	non-reasoning: 34
cached: 2688	reasoning: 0

— Testing gpt-5.2-chat-latest (Responses)
gpt-5.2-chat-latest Image usage: 3270 Image prompt tokens: 3270 Total Usage: 3277
pong

I’m here.
(Also, the image appears to be a solid bright magenta color with no visible details.)

input tokens: 3277	output tokens: 33
uncached: 589	non-reasoning: 33
cached: 2688	reasoning: 0

— Testing gpt-5-mini-2025-08-07 (Chat Completions)
gpt-5-mini-2025-08-07 Image usage: 3280 Image prompt tokens: 2730 Total Usage: 3287
pong — I can see the images. How can I help with them?

input tokens: 3287	output tokens: 24
uncached: 599	non-reasoning: 24
cached: 2688	reasoning: 0

— Testing gpt-5-mini-2025-08-07 (Responses)
gpt-5-mini-2025-08-07 Image usage: 3270 Image prompt tokens: 2720 Total Usage: 3277
Pong — I see the image you uploaded. How can I help with it?

input tokens: 3277	output tokens: 23
uncached: 589	non-reasoning: 23
cached: 2688	reasoning: 0

Markdown report:

model	chat completions	responses
gpt-5.2-2025-12-11	2730	3270
gpt-5.2-chat-latest	2920	3270
gpt-5-mini-2025-08-07	2730	2720

Documentation

Images and vision needs an update to the patches calculation table to include gpt-5.2
The description of detail:low there should directly note the variable tile cost per model, and that with patches vision models detail has no effect

Caching overbilling for images

Despite sending the same input to the same model throughout tests, there is much more uncached context being billed than the increments of 128 token blocks that is documented.

Then, this additional overbilling despite a cache hit is particularly worse using Responses in conjunction with vision:

a jump from uncached: 239 to 589 on gpt-5.2-chat-latest

a jump from uncached: 177 to 717 on gpt-5.2

The overbilling for alleged cache miss can be directly attributed to the billing formula of a “multiplier” being applied outside of cache billing:
717 - 177 = 540

10 images to gpt-5.2 on Chat Completions: 2737 input tokens

10 images to gpt-5.2 on Responses: 2737 input tokens

A difference of 540 tokens from each image overbilled by 54 tokens (and each overbilled by +1)

Conclusion: On Responses, OpenAI is not just overbilling the multiplier, but also excluding this overbilling amount from the cache discount. Formulaic failure.

Vision model truth table

This Python dict with limited fields is used for my scripting. It informs of endpoint compatibility, vision method and cost multiplier (with other gates merely notes). “plus” is the token overhead of any request (while “msg”, per-message overhead, seems unwavering).

MODEL_CAPABILITIES = {
    # GPT-5.2 Family (reasoning.effort "none" will now tolerate temperature/top_p on non-chat)
    "gpt-5.2-2025-12-11":      {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "patch", "mult": 1.0, "verbosity": 1, "min_effort": "none", "alias": "gpt-5.2"},
    "gpt-5.2-pro-2025-12-11":  {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "patch", "mult": 1.0, "verbosity": 1, "min_effort": "none", "alias": "gpt-5.2-pro"},
    # "chat" in gpt-5.x: Not supporting sampling parameters even at "none", the opposite
    "gpt-5.2-chat-latest":     {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "patch", "mult": 1.0, "verbosity": 1, "min_effort": "none"},

    # GPT-5.1 Family (reasoning.effort: "none" introduced as default) - any "min_effort" will indicate a reasoning model
    "gpt-5.1-codex-max":       {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 70, "min_effort": "low"},
    "gpt-5.1-codex":           {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 70, "min_effort": "low"},
    "gpt-5.1-codex-mini":      {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "patch", "mult": 1.2, "min_effort": "low"},
    "gpt-5.1-2025-11-13":      {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 70, "verbosity": 1, "min_effort": "none", "alias": "gpt-5.1"},
    "gpt-5.1-chat-latest":     {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 70, "min_effort": "medium"},

    # GPT-5 Family (reasoning.effort: "minimal" floor w default "medium", new "verbosity")
    "gpt-5-pro-2025-10-06":    {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 70, "verbosity": 1, "min_effort": "high", "alias": "gpt-5-pro"},
    "gpt-5-codex":             {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 70, "min_effort": "low"},
    "gpt-5-2025-08-07":        {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 70, "verbosity": 1, "min_effort": "minimal", "alias": "gpt-5"},
    "gpt-5-mini-2025-08-07":   {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "patch", "mult": 1.2, "verbosity": 1, "min_effort": "minimal", "alias": "gpt-5-mini"},
    "gpt-5-nano-2025-08-07":   {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "patch", "mult": 1.5, "verbosity": 1, "min_effort": "minimal", "alias": "gpt-5-nano"},
    "gpt-5-chat-latest":       {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "tile", "tile": 70, "sampling": 1},

    # O-Series (reasoning.effort: "low" floor, no verbosity)
    "o3-pro-2025-06-10":       {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 75, "min_effort": "low", "alias": "o3-pro"},
    "o3-2025-04-16":           {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 75, "min_effort": "low", "alias": "o3"},
    "o4-mini-2025-04-16":      {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "patch", "mult": 1.72, "min_effort": "low", "alias": "o4-mini"},
    "o1-pro-2025-03-19":       {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 75, "min_effort": "low", "alias": "o1-pro"},
    "o1-2024-12-17":           {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 75, "min_effort": "low", "alias": "o1"},
    # deep-research in model requires internal RAG tool or web_search with context:medium, no user location
    #"o3-deep-research-2025-06-26": {"cc": 0, "responses": 1, "vision": "tile", "tile": 75, "alias": "o3-deep-research"}, #815t-375,
    #"o4-mini-deep-research-2025-06-26": {"cc": 0, "responses": 1, "vision": "patch", "mult": 1.72, "alias": "o3-deep-research"}, #909t-468

    # GPT-4x Family (Standard Sampling)
    "gpt-4.1-2025-04-14":      {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "tile", "tile": 85, "sampling": 1, "alias": "gpt-4.1"},
    "gpt-4.1-mini-2025-04-14": {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "patch", "mult": 1.62, "sampling": 1, "alias": "gpt-4.1-mini"},
    "gpt-4.1-nano-2025-04-14": {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "patch", "mult": 2.46, "sampling": 1, "alias": "gpt-4.1-nano"},
    "gpt-4o-2024-11-20":       {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "tile", "tile": 85, "sampling": 1},
    "gpt-4o-2024-08-06":       {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "tile", "tile": 85, "sampling": 1, "alias": "gpt-4o"},
    "gpt-4o-mini-2024-07-18":  {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "tile", "tile": 85, "mult": 33.333, "sampling": 1, "alias": "gpt-4o-mini"},
    "gpt-4o-2024-05-13":       {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "tile", "tile": 85, "sampling": 1},
    "gpt-4-turbo-2024-04-09":  {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "tile", "tile": 85, "sampling": 1, "alias": "gpt-4-turbo"},

    # special: "computer-use-preview" takes only screenshot tool return; any CC "search" model takes no images
    #"computer-use-preview":        {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": False, "tile": 65}, # requires truncation:auto
    #"gpt-5-search-api-2025-10-14": {"cc": 1, "responses": 0, "msg": 4, "plus": 2, "vision": False, "alias": "gpt-5-search-api"},
}

ugljesas · December 17, 2025, 9:28am

Thanks for driving this and getting it (somewhat) fixed @_j! 5.2 still overbilling isn’t much of an issue right now as it’s not even capable of recognizing a black and white checkerboard in its current state. This is the output I’m getting consistently on the checkerboard test case from the previous thread: “The image appears to be almost entirely black, with no clearly visible objects or details.”

Topic		Replies	Views
[Responses API] GPT 5 ignores the detail parameter on image inputs API bug , responses-api	18	1572	January 22, 2026
GPT-4o-mini Vision API: High Prompt Token Usage in Batch Process Feedback	3	774	April 20, 2025
Batch file documentation error - and many more documentation issues added Documentation python , openai-documentation	20	805	June 5, 2025
Consuming more tokens than expected for image - Vision - gpt-4o Bugs	12	2150	March 9, 2025
GPT-4.1 vision price calculations -- incorrect billing on full model Bugs bug , gpt-4-vision , gpt-41	7	702	April 24, 2025