GPT-4.1 vision price calculations -- incorrect billing on full model

Billing issue with image tokens on API Playground for gpt-4.1

Regardless of the input image size sent, I get the same input tokens

GPT 4.1 image billing:
blank 1800x2400: 773t up
content 1800x2400 gif: 773t
content: 2400x1800 png24: 773t
content: 1024x1024 png32: 773t
content: 2400x1800 png24 web: same

gpt-4.1-mini:
content: 2400x1800 png24 web: 2362t (2,352.24 calculated by formula)

Vision pricing tool

It is pointless for me to share a tool (that OpenAI didn’t) - When the price charged is wrong.

Clarification: this is my web page that simply performs calculations on input image dimensions, to obtain the vision-related figures and costs.

Unless it is happy discount day (or overbill depending)

2 Likes

I believe there is a limiter for images.

  • If the number of patches exceeds 1536, we scale the image so that it can be covered by no more than 1536 patches.
  • The token cost is the number of patches, capped at a maximum of 1536 tokens

I chose the “bug” category, and not the “I’m confused” category. :grimacing:

There’s not exactly a limiter, but there’s a downscaler, but which one has to infer backwards from a single useful example, and then run image size trials against edge cases to see if you got it right.

Then you discover that whether 1024, 1536, or the example’s 1452 tokens should be billed, the API is reporting about 763. Worse would be if this is what is actually happening, 763 vectorizations instead of full-quality vision.


How GPT-4.1 Calculates Image Tokens (Official Method)

GPT-4.1 calculates token counts for image inputs based on dividing the image into a grid of small patches, each patch being exactly 32 \times 32 pixels. The maximum allowed patch count (thus tokens billed) is 1536.

If the initial number of patches exceeds this maximum, the image is scaled down proportionally (preserving aspect ratio), ensuring it fits within this limit.

Below is the precise calculation and algorithmic logic.


Step 1: Initial Patch Calculation (no scaling)

Given an original image resolution of W \times H pixels, calculate the initial number of patches along width and height:

\text{initialPatchW} = \left\lceil \frac{W}{32} \right\rceil
\text{initialPatchH} = \left\lceil \frac{H}{32} \right\rceil

The total initial patch count is thus:

\text{initialTotal} = \text{initialPatchW} \times \text{initialPatchH}
  • If \text{initialTotal} \leq 1536, no resizing occurs. The token count is exactly \text{initialTotal}.
  • Otherwise, proceed to Step 2 for scaling.

Step 2: Approximate Scaling (preserving aspect ratio)

If scaling is necessary, first apply an approximate scaling factor to bring the total patch count near the allowed maximum (1536):

The scaling factor is computed as follows:

\text{firstScale} = \sqrt{\frac{1536 \times 32^2}{W \times H}}

Applying this scale factor to the image dimensions gives approximate scaled dimensions:

\text{width1} = \lfloor W \times \text{firstScale} \rfloor
\text{height1} = \lfloor H \times \text{firstScale} \rfloor

Compute the intermediate (non-integer) patch dimensions after the first scaling:

\text{patchW1} = \frac{\text{width1}}{32}, \quad \text{patchH1} = \frac{\text{height1}}{32}

Step 3: Precise Patch Alignment (exact integer patches)

To ensure that patches align exactly to integer counts, perform a precise second scaling step:

  • First, choose width as the reference dimension and set the final width patches to the integer just below the approximate patches:
\text{finalPatchW} = \lfloor \text{patchW1} \rfloor
  • Compute the exact adjustment scaling factor based on this width patch count:
\text{adjustmentScale} = \frac{\text{finalPatchW}}{\text{patchW1}}
  • Apply this exact scaling factor uniformly to both dimensions:
\text{widthFinal} = \lfloor \text{width1} \times \text{adjustmentScale} \rfloor
\text{heightFinal} = \lfloor \text{height1} \times \text{adjustmentScale} \rfloor
  • Now calculate the final height patches as the ceiling division of the new height:
\text{finalPatchH} = \left\lceil \frac{\text{heightFinal}}{32} \right\rceil
  • The total final tokens (patches) are:
\text{finalTokens} = \text{finalPatchW} \times \text{finalPatchH}

Step 4: Final Check (always performed)

Although extremely rare, ensure as a final step that the token count does not exceed the maximum:

  • If \text{finalTokens} > 1536, adjust by removing one patch row from height dimension:
\text{finalPatchH} = \text{finalPatchH} - 1
  • Recalculate height dimension and tokens accordingly:
\text{heightFinal} = \text{finalPatchH} \times 32
\text{finalTokens} = \text{finalPatchW} \times \text{finalPatchH}

This guarantees compliance with the 1536 patch limit.


Concrete Example from OpenAI Documentation (1800 \times 2400 pixels):

Step 1: Initial patches

  • Width patches:
\left\lceil \frac{1800}{32} \right\rceil = 57
  • Height patches:
\left\lceil \frac{2400}{32} \right\rceil = 75
  • Total patches: 57 \times 75 = 4275 > 1536, scaling needed.

Step 2: Approximate scaling

  • Compute first scale factor:
\text{firstScale} = \sqrt{\frac{1536 \times 32^2}{1800 \times 2400}} \approx 0.603
  • New dimensions:
\text{width1} = \lfloor 1800 \times 0.603 \rfloor = 1086
\text{height1} = \lfloor 2400 \times 0.603 \rfloor = 1448
  • Intermediate patches:
\text{patchW1} = \frac{1086}{32} \approx 33.94, \quad \text{patchH1} = \frac{1448}{32} \approx 45.25

Step 3: Precise alignment

  • Final integer patches (width reference):
\text{finalPatchW} = \lfloor 33.94 \rfloor = 33
  • Adjustment scale factor:
\text{adjustmentScale} = \frac{33}{33.94} \approx 0.972
  • Precisely adjusted dimensions:
\text{widthFinal} = \lfloor 1086 \times 0.972 \rfloor = 1056
\text{heightFinal} = \lfloor 1448 \times 0.972 \rfloor = 1408
  • Final height patches:
\text{finalPatchH} = \left\lceil \frac{1408}{32} \right\rceil = 44
  • Final tokens:
\text{finalTokens} = 33 \times 44 = 1452

Thus, the final dimensions are exactly 1056 \times 1408 pixels with 33 \times 44 = 1452 tokens.


Pseudocode Summary (for direct programming implementation)

FUNCTION CalculateTokens(W, H):
    initialPatchW = ceil(W / 32)
    initialPatchH = ceil(H / 32)
    initialTotal = initialPatchW × initialPatchH

    IF initialTotal ≤ 1536:
        RETURN (tokens=initialTotal, width=W, height=H)

    firstScale = sqrt((1536 × 32²) / (W × H))
    width1 = floor(W × firstScale)
    height1 = floor(H × firstScale)

    patchW1 = width1 / 32
    finalPatchW = floor(patchW1)
    adjustmentScale = finalPatchW / patchW1

    widthFinal = floor(width1 × adjustmentScale)
    heightFinal = floor(height1 × adjustmentScale)
    finalPatchH = ceil(heightFinal / 32)

    finalTokens = finalPatchW × finalPatchH
    IF finalTokens > 1536:
        finalPatchH = finalPatchH - 1
        heightFinal = finalPatchH × 32
        finalTokens = finalPatchW × finalPatchH

    RETURN (tokens=finalTokens, width=widthFinal, height=heightFinal,
            patchesW=finalPatchW, patchesH=finalPatchH)

Isee… my bad. I thought you were referring to the image you posted on the “price calculator”.

It seems strange indeed, doesn’t match the formula on docs.

1 Like

Ok I think I’ve solved what’s going on for you – the documentation is SUPER confusing, but it you look closely at the section headers “GPT-4.1-mini, GPT-4.1-nano, o4-mini” pricing is handled TOTALLY DIFFERENTLY than “GPT 4o, GPT-4.1, GPT-4o-mini, CUA, and o-series (except o4-mini)”

You’re looking at the wrong section.

With GPT-4.1, everything is scaled so the shortest size is 768px [as described here]. The upper limit per photo is 765(https://platform.openai.com/docs/guides/images#gpt-4o-gpt-4-1-gpt-4o-mini-cua-and-o-series-except-o4-mini)

ALSO there already IS a vision calculating pricing on their website:


it’s hiding at the bottom of https://openai.com/api/pricing/ just click to expand “How is pricing calculated for images?”

1 Like

You have discovered something that simply wasn’t there. They have changed the documentation, again. The header for the patches pricing was for “gpt-4.1 series”. More line-by-line examination against previous documentation.

So now you just have to ponder the motivation behind billing the same technology at different pricing, again.

Great, sounds like they fixed their documentation then!

Yes, not sure the reason for the totally different methods of pricing..

Might be unrelated, but when I tested these near release, it looked like GPT-4.1-mini used roughly twice the input tokens per image, and nano roughly 3x as many as the full-sized GPT-4.1 for the same images.

This is in comparison to GPT-4o-mini, which had a roughly 30x scaler when inputting images, making the image processing essentially cost-comparable to fullsize gtp-4o. The 4.1 family should make image processing much, much, cheaper than 4o series, but the multipliers are still kinda weird.