GPT-4o-mini Vision API: High Prompt Token Usage in Batch Process

Yes, it was acknowledged as being intentional by OpenAI, gpt-4o-miini image input is the same price as gpt-4o-2024-05-13, and now double the price of later gpt-4o.

Input: 1800 × 2400 px → Resized: 768x1024 px → (765 x 33.33 tokens)

wontfix


The same old tile formula for token calculation is said to also apply to “o” models - but applying their own higher token price billing, and no multiplier stated.


Bad GPT-4.1 billing not following documentation

There’s new gpt-4.1 pricing, getting a new multiplier for smaller models that is

mini: 1.62x
nano: 2.46x

Following the documentation for gpt-4.1 image pricing and calculations:

Input: 1800 × 2400 px → Resized: 1056 × 1408 px (33 × 44 tokens)

gpt-4o-mini instead would be resized to 768x1024 = 4 tiles = 85 x 9) = (765 x 33.33 tokens)

gpt-4.1: $0.0029 (but billed wrong)
mini: $0.0009
nano: $0.0004

but billing remains broken on GPT-4.1. Its still getting billing like the tiles formula.

Sending 128x64:

gpt-4.1: 8 tokens calculated -->263t (two is 518t)
mini: ~13 tokens calculated -->22t (with overhead)
nano: ~20 tokens calculated -->29t

sending 512x512:

gpt-4.1: 256 tokens calculated -->263t
o4-mini: ?? with low reasoning → 449t

sending 513x513 (+33 tokens calculated):

gpt-4.1: 289 tokens calculated → 773t
o4-mini: ?? with low reasoning → 506t (+57)

or 2048 × 1336 for maximum billing:

gpt-4.1: 1536 → 1,114t

So I show that both the GPT-4.1 billing is screwed up, making a huge jump to a new price with a smallest change, and yet o4-mini also is screwed up or is using the patches formula undocumented for it.


What could explain the billing? Could it be this unwanted injection being billed?

Knowledge cutoff: 2024-06

Knowledge cutoff: 2023-10
Image capabilities: Enabled

Image safety policies:
Not Allowed: Giving away or revealing the identity or name of real people in images, even if they are famous - you should NOT identify real people (just say you don't know). Stating that someone in an image is a public figure or well known or recognizable. Saying what someone in a photo is known for or what work they've done. Classifying human-like images as animals. Making inappropriate statements about people in images. Stating, guessing or inferring ethnicity, beliefs etc etc of people in images.
Allowed: OCR transcription of sensitive PII (e.g. IDs, credit cards etc) is ALLOWED. Identifying animated characters.

If you recognize a person in a photo, you MUST just say that you don't know who they are (no need to explain policy).

Your image capabilities:
You cannot recognize people. You cannot tell who people resemble or look like (so NEVER say someone resembles someone else). You cannot see facial structures. You ignore names in image descriptions because you can't tell.

Adhere to this in all languages.

Here are some additional instructions, but remember to always to follow the above:

(your system message)

that gives 250 tokens, very similar. However two 8 token images is two 254 token billings.

The billing is completely messed up.

May: OpenAI has changed the documentation for billing and what formula applies to which models, more in line with my discovery, with more multiplier for o4-mini.