[Responses API] GPT 5 ignores the detail parameter on image inputs

Since yesterday it seems that the “gpt-5” model ignores setting “detail”: “low” on the input_image content type.

{"type": "input_image", "image_url": f"data:image/jpeg;base64,{frame}", "detail": "low"}

uses the same amount of tokens as:

{"type": "input_image", "image_url": f"data:image/jpeg;base64,{frame}", "detail": "high"}

This isn’t the case when using o3, and also wasn’t the case with gpt5 immediately after release. All tested on the same inputs. I’ve tried a couple of different message structures but seem to always get the same result. Did anyone else run into this?

3 Likes

I’m so sorry to hear about that. It is disappointing when an API doesn’t meet your expectations, and overbills.

Platform documentation that has GPT-5 in a table (but doesn’t have it in the header), still reassures us:

A 4096 x 8192 image in “detail”: “low” most costs 85 tokens
Regardless of input size, low detail images are a fixed cost.

Using Chat Completions, today:

— Testing “low”
A black-and-white checkerboard pattern of alternating squares arranged in a grid. CompletionUsage(completion_tokens=24, prompt_tokens=79, total_tokens=103, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))

This corresponds to the base tokens of 70 for the gpt-5 model.

— Testing “high”
A high-contrast black-and-white checkerboard pattern filling the frame.
CompletionUsage(completion_tokens=23, prompt_tokens=359, total_tokens=382, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))

This corresponds to 1x70 + 2x140 = 350 of the two-tile image.


Here’s Chat Completions code to do the same with the magic of "type": "image_url" and an image included within. Refactor for Responses so you can care to submit a bug report to the tracker (oops, there’s none) and a refund invoice for any overbilling discovered)

"""send a checkerboard_513x512.png 2,144 bytes for vision"""
base64_encoded_image = "iVBORw0KGgoAAAANSUhEUgAAAgEAAAIACAIAAACU2CiTAAAIJ0lEQVR4nO3XwQ0cMQwEQdP550xnwJ+wxnVVANLo1dDs7p+XZubp+fbf7L/Zf7P/5/f/fXoBAP8zDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAICueX3B7j49f+btE+y/2X+z/2b/5/v9AwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6ZnffXjDz9Hz7b/bf7L/Z//P7/QMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAuub1Bbv79PyZt0+w/2b/zf6b/Z/v9w8A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6JrdfXvBzNPz7b/Zf7P/Zv/P7/cPAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOia1xfs7tPzZ94+wf6b/Tf7b/Z/vt8/AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBrdvftBTNPz7f/Zv/N/pv9P7/fPwCgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACga15fsLtPz595+wT7b/bf7L/Z//l+/wCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCArtndtxfMPD3f/pv9N/tv9v/8fv8AgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgK55fcHuPj1/5u0T7L/Zf7P/Zv/n+/0DALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALpmd99eMPP0fPtv9t/sv9n/8/v9AwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC65vUFu/v0/Jm3T7D/Zv/N/pv9n+/3DwDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDomt19e8HM0/Ptv9l/s/9m/8/v9w8A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6JrXF+zu0/Nn3j7B/pv9N/tv9n++3z8AoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoGt29+0FM0/Pt/9m/83+m/0/v98/AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBrXl+wu0/Pn3n7BPtv9t/sv9n/+X7/AIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIA/Wf8AoBab6CH9t6EAAAAASUVORK5CYII="
image_parts = [{
        "type": "image_url",
        "image_url": {
            "url": f"data:image/png;base64,{base64_encoded_image}",
            "detail": "low",  # parameter under test
        },
    }]
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe image briefly"},
            *image_parts,
        ],
    },
]

parameters = {
    "model": "gpt-5",
    "messages": messages,
    "max_completion_tokens": 4000,
    "verbosity": "low",
    "reasoning_effort": "minimal",
}

# Send the request and receive the response
print(f"--- Testing")
import openai
client = openai.Client()
response = client.chat.completions.create(**parameters)
print(response.choices[0].message.content)
print(response.usage)
2 Likes

Yes, it seems to work with Chat Completions, but easily reproducible for Responses.

— Testing "low"
A black-and-white checkerboard pattern with alternating squares arranged in a grid.
ResponseUsage(input_tokens=358, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=21, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=379)

From recent testing I see gpt-5 isn’t stable or polished enough for production use yet. It’s a shame since it actually shows improvements for my use-cases.

1 Like

Here’s the full repro code for reference:

"""send a checkerboard_513x512.png 2,144 bytes for vision"""
base64_encoded_image = "iVBORw0KGgoAAAANSUhEUgAAAgEAAAIACAIAAACU2CiTAAAIJ0lEQVR4nO3XwQ0cMQwEQdP550xnwJ+wxnVVANLo1dDs7p+XZubp+fbf7L/Zf7P/5/f/fXoBAP8zDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAICueX3B7j49f+btE+y/2X+z/2b/5/v9AwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6ZnffXjDz9Hz7b/bf7L/Z//P7/QMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAuub1Bbv79PyZt0+w/2b/zf6b/Z/v9w8A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6JrdfXvBzNPz7b/Zf7P/Zv/P7/cPAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOia1xfs7tPzZ94+wf6b/Tf7b/Z/vt8/AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBrdvftBTNPz7f/Zv/N/pv9P7/fPwCgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACga15fsLtPz595+wT7b/bf7L/Z//l+/wCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCArtndtxfMPD3f/pv9N/tv9v/8fv8AgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgK55fcHuPj1/5u0T7L/Zf7P/Zv/n+/0DALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALpmd99eMPP0fPtv9t/sv9n/8/v9AwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC65vUFu/v0/Jm3T7D/Zv/N/pv9n+/3DwDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDomt19e8HM0/Ptv9l/s/9m/8/v9w8A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6JrXF+zu0/Nn3j7B/pv9N/tv9n++3z8AoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoGt29+0FM0/Pt/9m/83+m/0/v98/AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBLAwC6NACgSwMAujQAoEsDALo0AKBrXl+wu0/Pn3n7BPtv9t/sv9n/+X7/AIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIAuDQDo0gCALg0A6NIAgC4NAOjSAIA/Wf8AoBab6CH9t6EAAAAASUVORK5CYII="
test_value = "low"
image_parts = [{
        "type": "input_image",
        "image_url": f"data:image/png;base64,{base64_encoded_image}",
        "detail": test_value,  # parameter under test
    }]
messages = [
    {
        "role": "user",
        "content": [
            {"type": "input_text", "text": "Describe image briefly"},
            *image_parts,
        ],
    },
]

parameters = {
    "model": "gpt-5",
    "input": messages,
}

# Send the request and receive the response
print(f"--- Testing \"{test_value}\"")
import openai
client = openai.Client()
response = client.responses.create(**parameters)
print(response.output_text)
print(response.usage)
2 Likes

Did they ever address this (at least that they’re looking into it)?

We’re doing heavy image analysis work and this issue is exploding our token consumption.

Nope, doesn’t seem to be a priority for them. Even when manually resizing images to fit in 512x512 they still charge an extra tile of 140 tokens. Due to this and frequent hallucinations in our testing we’ve decided to not switch to the new models until things are improved.

@edwinarbus Is this bug on any team’s radar?

The "detail": "low" setting is not limiting token consumption to 70 per input image for gpt-5 via Responses API. Tested again today.

Even if not a priority fix, having this be on the dock would be greatly appreciated. Thank you!

Thanks for flagging this. We’re aware of an issue in the Responses API with GPT-5 where the detail parameter on image inputs isn’t always applied as expected. The model should support vision, but there are some bugs in how image assets are handled right now, which can lead to parameters being ignored.

Our team is tracking this and working on improvements. In the meantime, if you’re able to share a minimal example (including the request format and request IDs), that would help us correlate with ongoing fixes. A temporary workaround some developers use is passing images as base64 strings outside of tool responses, but we recognize this isn’t a perfect solution.

We’ll share updates as we have them. Thanks for your patience while we work on a fix.

This bot response usage by “support” is getting silly.

Anyone is aware of an issue by reading this topic. Anyone getting the results of the AI looking at the image at high expense knows the model should support vision. Plus, please explain the nonsense logic backing “A temporary workaround some developers use is passing images as base64 strings outside of tool responses”: what the heck does that even mean? The AI writes absolute horse-puckey, as nobody here is putting vision messages in a tool role to start with, which is impossible by OpenAI’s validation of inputs, and the “temporary workaround” is the expected way of sending messages that is causing the overbilling.

Again, an AI bot programmed to encourage developers to waste their time so that the representative makes no action, or the AI is powerless to take action, or even presents mistruths about anything being done at all. The exact reproduction steps in this topic are so simple, simply scrolling back to look at the request and the high token bill in previous reproduction example already provided - this response is insulting in its lack of effort.

3 Likes

I will indulge, with an absolute demonstration that gpt-5 is billing full price on the Responses API endpoint for images that are sent for vision with the parameter `“detail”:“low”, while other models are working fine.

Python code

Sends a 768x768 image at “low” and asks its color, the exact same parameters, to three different models.*

"""demonstrate OpenAI overcharging for "detail:low" gpt-5 images"""
from openai import OpenAI; c = OpenAI()
def get_webp(rgb_color=(250, 0, 250), size=768) -> str:
    import io, base64
    from PIL import Image
    b = io.BytesIO(); Image.new("RGB", (size, size), rgb_color).save(
        b, "WEBP", lossless=True, quality=100, method=6)
    return base64.b64encode(b.getvalue()).decode("ascii")
req = {
    "input": [{
        "role": "user",
        "content": [
            {"type": "input_text", "text": "One word: image color?"},
            {
                "type": "input_image",
                "image_url": f"data:image/webp;base64,{get_webp()}",
                "detail": "low",  # this must charge only 70-85 tokens input
            }
        ]
    }]
}
for model in ["gpt-4o", "gpt-4.1", "gpt-5"]:
    resp = c.responses.with_raw_response.create(model=model, **req)
    r = resp.parse()
    print(f"{model}: {resp.headers.get("x-request-id")}\n{r.id}\n{r.output_text}")
    print(f"Cost: {r.usage.input_tokens} tokens\n---")

(requires PIL image library)

Results with header request ID and “Responses” response_id

gpt-4o: req_ab7163fa5cafae1ea190c91688c808ce
resp_68a73089c4188190af2b03062fd475f3043869f1c73ce74b
Magenta
Cost: 98 tokens
---
gpt-4.1: req_db08552fe69cd2309847d7142b9fab3b
resp_68a7308afc8c8193a653f260ac7d55ce00cf7ef844dda4ca
Magenta
Cost: 98 tokens
---
gpt-5: req_b11f374be0486ab8c238220649d34605
resp_68a7308c08f08196a18cac58fe8ef6a0067e34d73d57b265
magenta
Cost: 639 tokens

That final call:

"usage":{"input_tokens":639,"input_tokens_details":{"cached_tokens":0},"output_tokens":136,"output_tokens_details":{"reasoning_tokens":128},"total_tokens":775}

2 Likes

Hi @_j,

You’re not imagining things (and I promise I’m not a bot 😅 Just a human who made this official Support handle to keep things organized on our end).

The detail parameter on image inputs with GPT-5 in the Responses API isn’t always respected right now. This is a known quirk with how image inputs are processed under the hood, and I've been able to reproduce and created a bug report to continue tracking internally.

And again, we’ll share updates as we have them! Thanks for your investigation here.

Come to _j’s fuel station. Our sign might say $1.00 a litre, but that $6.00/l - $12.00/l and $500 we charged your card to fill up your car - just a known quirk.Same magnitude of overbilling issue.

Not as big an “oh, well” as the 90% discount we promise on repeat fill-ups, yet still bill the same price for you to discover (= no caching discount by gpt-5/mini either).

Don’t worry though, it’s high quality fuel.

@OpenAI_Support Can you please have the team check on this again?

This seemed to have been fixed a while back, but low setting for detail parameter is ignored again for Responses API (via gpt-5 model) as of yesterday.

Input token usage is again many times the expected input tokens for images. The 70 base token cap is ignored.

@OpenAI_Support Can also confirm this bug resurfaced recently after being fixed.

@OpenAI_Support Would greatly appreciate any information on the status of this issue - I’m looking to deploy an image analysis heavy workflow in the coming weeks and I’m not sure whether to refactor this to use chat completions, try to revert to an older model or wait for a fix. None of these are honestly ideal, and even if it’s fixed I’m worried it might resurface again.

Here is the mandatory fix by OpenAI until you can get your act together over there.

You are able to multiply the token cost of images by a multiplier factor such as 1.72x for o4-mini, 2.46x for gpt-5-nano, and even up to 33x for gpt-4o-mini.

Action

Set the vision price multiplier for gpt-5 to 0.077x to deliver the correct pricing equivalent to low detail.

Overbilling comparison

That’s a conservative price correction, for an image ratio closer to square than phone picture.

“But that would make high cheaper”, you protest? - well, makes up for months of overbilling, saying you fixed it, and then overbilling again.

(And the $6 that can be paid to a Sora-2 API call to receive nothing.)

2 Likes

Hi everyone! This should be fixed now. We found we were effectively dropping detail=low. This led us to send image_hr instead of image, which caused us to process the image as high res and bill accordingly.


Let me know if anyone was overbilled because of this and I'd be happy to help refund you.

3 Likes

This bug is still present on Azure OpenAI for GPT 5.2.

GPT-5.2 uses “patches” algorithm for vision, not “tiles”. That type of vision input does not have a low detail.

Of course noted nowhere and the pricing not in calculators - because the price of those images on gpt-5.2 with no detail low option is also about 3x more that GPT-5.1:high.