Responses API: STILL overbilling on image inputs (BUG/ISSUE)

Issue: detail setting is not being respected on the Responses API for the majority of models.

I’ve just run through ALL the “vision” API models that accept detail: low.

While Chat Completions is correct, the Responses API continues to overbill and not deliver low detail.

“detail”: “low”

model chat completions responses
gpt-5.1-codex-max 353
gpt-5.1-codex 353
gpt-5.1-codex-mini 272
gpt-5.1-2025-11-13 70 353
gpt-5.1-chat-latest 70 353
gpt-5-pro-2025-10-06 353
gpt-5-codex 353
gpt-5-2025-08-07 70 353
gpt-5-mini-2025-08-07 273 272
gpt-5-nano-2025-08-07 272 272
gpt-5-chat-latest 70 70
o3-pro-2025-06-10 85
o3-2025-04-16 75 75
o4-mini-2025-04-16 272 272
o1-pro-2025-03-19 75
o1-2024-12-17 75 75
gpt-4.1-2025-04-14 85 368
gpt-4.1-mini-2025-04-14 272 272
gpt-4.1-nano-2025-04-14 272 272
gpt-4o-2024-11-20 85 85
gpt-4o-2024-08-06 85 85
gpt-4o-mini-2024-07-18 85 85
gpt-4o-2024-05-13 85 85
gpt-4-turbo-2024-04-09 85 85

This is on a two-tile input image, a 528x512 WebP, (2 tiles on detail high, 272 1-token patches x multiplier). The over-billing is exactly that of detail==high:

We can contrast to actually requesting high:

“detail”: “high”

model chat completions responses
gpt-5.1-codex-max 350
gpt-5.1-codex 350
gpt-5.1-codex-mini 272
gpt-5.1-2025-11-13 350 350
gpt-5.1-chat-latest 350 350
gpt-5-pro-2025-10-06 350
gpt-5-codex 350
gpt-5-2025-08-07 350 350
gpt-5-mini-2025-08-07 273 272
gpt-5-nano-2025-08-07 272 272
gpt-5-chat-latest 350 350
o3-pro-2025-06-10 425
o3-2025-04-16 375 375
o4-mini-2025-04-16 272 272
o1-pro-2025-03-19 375
o1-2024-12-17 375 375
gpt-4.1-2025-04-14 425 425
gpt-4.1-mini-2025-04-14 272 272
gpt-4.1-nano-2025-04-14 272 272
gpt-4o-2024-11-20 425 425
gpt-4o-2024-08-06 425 425
gpt-4o-mini-2024-07-18 425 425
gpt-4o-2024-05-13 425 425
gpt-4-turbo-2024-04-09 425 425

(models with dashes are those pointlessly gated on Chat Completions, that could be using my own “code patch” function at my peril, just as I can drop encrypted reasoning in Responses self-management).

The scripting that I coded to reshape requests for tolerance against all models and endpoints also reverses the “cost multiplier” of mini and nano “patches” models and gpt-4o-mini back to the underlying input tokens with best effort, and classifies the overhead differences between models.

O3-pro is also overbilling vs documentation that states “o3”==75 token tiles, especially painful given the extreme cost.

This has been going on for months and months, and even applies to brand new models. As I stated before, with inaction by OpenAI after support posting in the same topic - this must be repaired - or mitigated. Deliver on the promises of documentation and the pricing page!

Further to note is that documentation is inaccurate. Specify image input detail level gives a particular 85 token cost as fact, instead the truth that it is a a per-model value that varies from 65 to 85 tokens - or delivers no savings at all on some:

You can save tokens and speed up responses by using "detail": "low". This lets the model process the image with a budget of 85 tokens.

That section and later makes no clear statement that “patches” vision models such as gpt-5-mini or gpt-4.1-nano will accept low detail as an input parameter, but deliver a full vision product regardless.

4 Likes

Thank you very much for taking the time to report this issue, @_j.

I was able to reproduce this on my end and have forwarded it to the team at OpenAI.

1 Like

What running every model OpenAI has looks like…
image

Especially peculiar is that the parameter DOES make a difference - you get billed more with detail:low than without on affected models.

Hey, Our engineering team has just deployed a fix. This issue should now be resolved. Thank you!

1 Like

Here is the current state of requesting low detail on that two-tile-size image that would otherwise use 1+4 base tiles instead of 1. Generally fixed; some remaining expectations not met:

model chat completions responses
gpt-5.2-2025-12-11 273 327
gpt-5.2-pro-2025-12-11 327
gpt-5.2-chat-latest 292 327
gpt-5.1-codex-max 70
gpt-5.1-codex 70
gpt-5.1-codex-mini 272
gpt-5.1-2025-11-13 70 70
gpt-5.1-chat-latest 70 70
gpt-5-pro-2025-10-06 70
gpt-5-codex 70
gpt-5-2025-08-07 70 70
gpt-5-mini-2025-08-07 273 272
gpt-5-nano-2025-08-07 272 272
gpt-5-chat-latest 70 70
o3-pro-2025-06-10 85
o3-2025-04-16 75 75
o4-mini-2025-04-16 272 272
o1-pro-2025-03-19 75
o1-2024-12-17 75 75
gpt-4.1-2025-04-14 85 85
gpt-4.1-mini-2025-04-14 272 272
gpt-4.1-nano-2025-04-14 272 272
gpt-4o-2024-11-20 85 85
gpt-4o-2024-08-06 85 85
gpt-4o-mini-2024-07-18 85 85
gpt-4o-2024-05-13 85 85
gpt-4-turbo-2024-04-09 85 85

Remaining billing anomalies

  1. Overbilling: gpt-5.2 sees a transition to “patches”, where Chat Completions sees the correct expected patches count. On Responses, however, it appears the “mini” billing multiplier of 1.2x is still being applied, despite this being a full-price (and even higher price) model;
  2. Overbilling: gpt-5.2-chat has anomalous extra billing per image on Chat Completions, which cannot be reconciled to any extra patches column or row. It is not overhead, it is per image.
  3. Overbilling: o3-pro is “o3” - why billing 85 instead of 75 still?
  4. Overbilling: a single extra token overhead on just CC with gpt-5.2 and gpt-5-mini? Not an error in my multiplier deobfuscation code, as Responses returns the exact expectation.

Anomaly validation of per-image overbilling

Running ten image inputs in a user message verifies these are per-image billing issues

model chat completions responses
gpt-5.2-2025-12-11 2730 3270
gpt-5.2-chat-latest 2920 3270
gpt-5-mini-2025-08-07 2730 2720
Detailed usage log of anomalous model calls

— Testing gpt-5.2-2025-12-11 (Chat Completions)
gpt-5.2-2025-12-11 Image usage: 2730 Image prompt tokens: 2730 Total Usage: 2737
pong

input tokens: 2737 output tokens: 4
uncached: 177 non-reasoning: 4
cached: 2560 reasoning: 0

— Testing gpt-5.2-2025-12-11 (Responses)
gpt-5.2-2025-12-11 Image usage: 3270 Image prompt tokens: 3270 Total Usage: 3277
pong

(Also, the image you attached appears to be a solid magenta block with no visible details.)

input tokens: 3277 output tokens: 26
uncached: 717 non-reasoning: 26
cached: 2560 reasoning: 0

— Testing gpt-5.2-chat-latest (Chat Completions)
gpt-5.2-chat-latest Image usage: 2920 Image prompt tokens: 2920 Total Usage: 2927
pong :white_check_mark:
I’m here.

(Also, the image you sent appears to be a solid brigh

input tokens: 2927 output tokens: 34
uncached: 239 non-reasoning: 34
cached: 2688 reasoning: 0

— Testing gpt-5.2-chat-latest (Responses)
gpt-5.2-chat-latest Image usage: 3270 Image prompt tokens: 3270 Total Usage: 3277
pong :white_check_mark:

I’m here.
(Also, the image appears to be a solid bright magenta color with no visible details.)

input tokens: 3277 output tokens: 33
uncached: 589 non-reasoning: 33
cached: 2688 reasoning: 0

— Testing gpt-5-mini-2025-08-07 (Chat Completions)
gpt-5-mini-2025-08-07 Image usage: 3280 Image prompt tokens: 2730 Total Usage: 3287
pong — I can see the images. How can I help with them?

input tokens: 3287 output tokens: 24
uncached: 599 non-reasoning: 24
cached: 2688 reasoning: 0

— Testing gpt-5-mini-2025-08-07 (Responses)
gpt-5-mini-2025-08-07 Image usage: 3270 Image prompt tokens: 2720 Total Usage: 3277
Pong — I see the image you uploaded. How can I help with it?

input tokens: 3277 output tokens: 23
uncached: 589 non-reasoning: 23
cached: 2688 reasoning: 0

Markdown report:

model chat completions responses
gpt-5.2-2025-12-11 2730 3270
gpt-5.2-chat-latest 2920 3270
gpt-5-mini-2025-08-07 2730 2720

Documentation

  • Images and vision needs an update to the patches calculation table to include gpt-5.2
  • The description of detail:low there should directly note the variable tile cost per model, and that with patches vision models detail has no effect

Caching overbilling for images

Despite sending the same input to the same model throughout tests, there is much more uncached context being billed than the increments of 128 token blocks that is documented.

Then, this additional overbilling despite a cache hit is particularly worse using Responses in conjunction with vision:

  • a jump from uncached: 239 to 589 on gpt-5.2-chat-latest
  • a jump from uncached: 177 to 717 on gpt-5.2

The overbilling for alleged cache miss can be directly attributed to the billing formula of a “multiplier” being applied outside of cache billing:
717 - 177 = 540

  • 10 images to gpt-5.2 on Chat Completions: 2737 input tokens
  • 10 images to gpt-5.2 on Responses: 2737 input tokens
  • A difference of 540 tokens from each image overbilled by 54 tokens (and each overbilled by +1)

Conclusion: On Responses, OpenAI is not just overbilling the multiplier, but also excluding this overbilling amount from the cache discount. Formulaic failure.


Vision model truth table

This Python dict with limited fields is used for my scripting. It informs of endpoint compatibility, vision method and cost multiplier (with other gates merely notes). “plus” is the token overhead of any request (while “msg”, per-message overhead, seems unwavering).

MODEL_CAPABILITIES = {
    # GPT-5.2 Family (reasoning.effort "none" will now tolerate temperature/top_p on non-chat)
    "gpt-5.2-2025-12-11":      {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "patch", "mult": 1.0, "verbosity": 1, "min_effort": "none", "alias": "gpt-5.2"},
    "gpt-5.2-pro-2025-12-11":  {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "patch", "mult": 1.0, "verbosity": 1, "min_effort": "none", "alias": "gpt-5.2-pro"},
    # "chat" in gpt-5.x: Not supporting sampling parameters even at "none", the opposite
    "gpt-5.2-chat-latest":     {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "patch", "mult": 1.0, "verbosity": 1, "min_effort": "none"},

    # GPT-5.1 Family (reasoning.effort: "none" introduced as default) - any "min_effort" will indicate a reasoning model
    "gpt-5.1-codex-max":       {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 70, "min_effort": "low"},
    "gpt-5.1-codex":           {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 70, "min_effort": "low"},
    "gpt-5.1-codex-mini":      {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "patch", "mult": 1.2, "min_effort": "low"},
    "gpt-5.1-2025-11-13":      {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 70, "verbosity": 1, "min_effort": "none", "alias": "gpt-5.1"},
    "gpt-5.1-chat-latest":     {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 70, "min_effort": "medium"},

    # GPT-5 Family (reasoning.effort: "minimal" floor w default "medium", new "verbosity")
    "gpt-5-pro-2025-10-06":    {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 70, "verbosity": 1, "min_effort": "high", "alias": "gpt-5-pro"},
    "gpt-5-codex":             {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 70, "min_effort": "low"},
    "gpt-5-2025-08-07":        {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 70, "verbosity": 1, "min_effort": "minimal", "alias": "gpt-5"},
    "gpt-5-mini-2025-08-07":   {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "patch", "mult": 1.2, "verbosity": 1, "min_effort": "minimal", "alias": "gpt-5-mini"},
    "gpt-5-nano-2025-08-07":   {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "patch", "mult": 1.5, "verbosity": 1, "min_effort": "minimal", "alias": "gpt-5-nano"},
    "gpt-5-chat-latest":       {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "tile", "tile": 70, "sampling": 1},

    # O-Series (reasoning.effort: "low" floor, no verbosity)
    "o3-pro-2025-06-10":       {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 75, "min_effort": "low", "alias": "o3-pro"},
    "o3-2025-04-16":           {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 75, "min_effort": "low", "alias": "o3"},
    "o4-mini-2025-04-16":      {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "patch", "mult": 1.72, "min_effort": "low", "alias": "o4-mini"},
    "o1-pro-2025-03-19":       {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 75, "min_effort": "low", "alias": "o1-pro"},
    "o1-2024-12-17":           {"cc": 1, "responses": 1, "msg": 4, "plus": 2, "vision": "tile", "tile": 75, "min_effort": "low", "alias": "o1"},
    # deep-research in model requires internal RAG tool or web_search with context:medium, no user location
    #"o3-deep-research-2025-06-26": {"cc": 0, "responses": 1, "vision": "tile", "tile": 75, "alias": "o3-deep-research"}, #815t-375,
    #"o4-mini-deep-research-2025-06-26": {"cc": 0, "responses": 1, "vision": "patch", "mult": 1.72, "alias": "o3-deep-research"}, #909t-468

    # GPT-4x Family (Standard Sampling)
    "gpt-4.1-2025-04-14":      {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "tile", "tile": 85, "sampling": 1, "alias": "gpt-4.1"},
    "gpt-4.1-mini-2025-04-14": {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "patch", "mult": 1.62, "sampling": 1, "alias": "gpt-4.1-mini"},
    "gpt-4.1-nano-2025-04-14": {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "patch", "mult": 2.46, "sampling": 1, "alias": "gpt-4.1-nano"},
    "gpt-4o-2024-11-20":       {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "tile", "tile": 85, "sampling": 1},
    "gpt-4o-2024-08-06":       {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "tile", "tile": 85, "sampling": 1, "alias": "gpt-4o"},
    "gpt-4o-mini-2024-07-18":  {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "tile", "tile": 85, "mult": 33.333, "sampling": 1, "alias": "gpt-4o-mini"},
    "gpt-4o-2024-05-13":       {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "tile", "tile": 85, "sampling": 1},
    "gpt-4-turbo-2024-04-09":  {"cc": 1, "responses": 1, "msg": 4, "plus": 3, "vision": "tile", "tile": 85, "sampling": 1, "alias": "gpt-4-turbo"},

    # special: "computer-use-preview" takes only screenshot tool return; any CC "search" model takes no images
    #"computer-use-preview":        {"cc": 0, "responses": 1, "msg": 4, "plus": 2, "vision": False, "tile": 65}, # requires truncation:auto
    #"gpt-5-search-api-2025-10-14": {"cc": 1, "responses": 0, "msg": 4, "plus": 2, "vision": False, "alias": "gpt-5-search-api"},
}
3 Likes

Thanks for driving this and getting it (somewhat) fixed @_j! 5.2 still overbilling isn’t much of an issue right now as it’s not even capable of recognizing a black and white checkerboard in its current state. This is the output I’m getting consistently on the checkerboard test case from the previous thread: “The image appears to be almost entirely black, with no clearly visible objects or details.”