If you’re building agentic coding pipelines, subagent workflows in Codex, or latency-sensitive apps, these models should unlock meaningful cost and speed wins right away.
Feel free to drop any feedback or questions on integration tips, quota details, use-case ideas, or anything else below
Subagents using full models can burn through credits very quickly. It is good to know there are now fast, capable options for handling supporting tasks.
Documentation is wrong - experimentally, currently:
Model
Multiplier
Max Billed Image Tokens
Verified
gpt-5.4-mini
1.2
1843
API
gpt-5.4-nano
1.2
1843
API
gpt-5-mini
1.2
1843
True
gpt-5-nano
1.5
2304
True
Includes the billable tokens at "high’ per image. "detail":"low" is of no effect on these “patches” AI models, which is obfuscated in documentation.
Vision docs have the wrong multiplier for gpt-5 and gpt-5.4 mini/nano - unless cost is to be stealth increased. TBD.
API call, where I capture the image-only cost, and then de-multiply it back to see if it agrees with patches formula:
model
vision
vision_mult
chat input
calculated
responses input
calculated
gpt-5.4-mini
patch
1.2
527
433
526
432
gpt-5-mini
patch
1.2
527
433
526
432
gpt-5.4-nano
patch
1.2
527
433
526
432
gpt-5-nano
patch
1.5
656
432
655
432
Despite being designated for “future models”, and gpt-5.4-mini/nano being in future from the original documentation, these small models disallow the larger 2500 patches vision at “high” or the “original” resolution.
What it should look like, given the currently realized costs:
Model
Multiplier
gpt-5.4-mini-2026-03-17
1.2x
gpt-5.4-nano-2026-03-17
1.2x
gpt-5.4-2026-03-05
1.2x
gpt-5.3-codex
1.2x
gpt-5.2-2025-12-11
1.2x
gpt-5-mini-2025-08-07
1.2x
gpt-5-nano-2025-08-07
1.5x
o4-mini-2025-04-16
1.72x
gpt-4.1-mini-2025-04-14
1.62x
gpt-4.1-nano-2025-04-14
2.46x
codex-mini-latest
1.72x
The maximum vision input size of the new GPT-5.4-mini and nano models is wrong. You can send 1600x1600 and get billed for 50x50 patches = 2500 (haven’t tested “original”).
Chat Completions with gpt-5.4-mini and nano is resizing wrong or billing wrong (cheaper). Here is sending that 1600x1600px for 2500 patches/tokens:
model
vision
mult
ChatC
Ccalculated
Responses
Rcalculated
gpt-5.4
patch
1.2
2813
2338
3008
2500
gpt-5.4-mini
patch
1.2
2813
2338
3008
2500
gpt-5.4-nano
patch
1.2
2813
2338
3008
2500
gpt-5-mini
patch
1.2
1834
1522
1833
1521
gpt-5-nano
patch
1.5
2290
1522
2289
1521
The amount of input “usage” received back per endpoint is the columns “ChatC” and “Responses”. Image consumption then by input difference due to inclusion of the image, and then reversing the apparent multiplier.
If it were downsized, like the original mini and nano: 1248 × 1248 px (39 × 39 patches) = 1521 tokens.
This is not the only model disparity in billing between API endpoints. Here’s sending the image to others to where it should be downsized. Price should be the same between Chat Completions and Responses, but is not.
model
vision
vision_mult
chat input
calculated
responses input
calculated
gpt-5.3-chat-latest
patch
1.2
1667
1383
1833
1521
gpt-5.3-codex
patch
1.2
-
-
1833
1521
gpt-5.2-chat-latest
patch
1.2
1667
1383
1833
1521
And I have even another way of sending the same image to Chat Completions - the same vision, cheaper price for me still on the “chat” model:
model
vision
vision_mult
chat input
calculated
r
rerror
gpt-5.4
patch
1.2
2813
2338
E400
E400
gpt-5.3-chat-latest
patch
1.2
1548
1284
E400
E400
Vision price inflation
Ultimately, when I integrate what the API is currently costing into my own calculator, - the same image, downsized manually to the old “high” resolution (which doesn’t happen as documented), vision still costs 3x on mini and 3.2x on nano new models.