I’m genuinely frustrated by the current Images & Vision documentation.
It mixes multiple image token accounting systems (BASE/TILE vs PATCH×MULTIPLIER) across different sections without clearly stating which models each rule applies to.
As a result, statements like “detail=low costs 85 tokens” appear to be general rules, while they are actually model-specific (true for GPT-4o, false for 4o-mini or GPT-5-mini).
This is not a minor wording issue: it can easily lead to incorrect cost estimation in production.
At the very least, each cost formula should explicitly list the exact models it applies to. Right now, understanding the pricing requires reverse-engineering the documentation instead of reading it.
I’ve done that. Even when OpenAI hadn’t and still doesn’t (they finally added GPT-5.2 to their own calculator, for which you can input a resolution and still get a different token count than billed, after a month of mystery pricing to truly be reverse-engineered).
You can pull the truth table and even resizing algorithms and multiplying of costs out of the script.
Compare quickly
Oh, and over here is Python table with some of those facts: which models support vision, their vision algorithm, cost multiplier and cost per tile. Even endpoint it can run on and the token overhead per message and per call.
Thanks, that’s actually very helpful, and it confirms what I was running into.
My confusion wasn’t about whether reverse-engineering is possible, but about the fact that the official documentation mixes multiple vision pricing regimes without clearly scoping them per model, which makes it hard to know when reverse-engineering is required vs when the documented rules apply.
The existence of hidden multipliers (like the GPT-5.2 ×1.2 you mention), undocumented resize behavior, and per-call overhead explains why the billed tokens don’t always line up with the published formulas or even the official calculator.
I’ll take a look at the hotnova script, thanks for sharing it.