First plug: a web page to make vision pricing clearer.
- Add an image, by either upload or URL, or simply simulating the dimensions. Add multiple images;
- Select the (practical) model that you want to see the price calculated for.
It just can’t be 100% though, we’ll find out…
Here’s how the API Reference would have you send images
user_doc = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "repeat word from attached image"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{base64_image}",
"detail": "auto"
}
},
],
}
]
But - did you know there’s another way?
- when you are sending a base64 image
user_alt1 = [
{
"role": "user",
"content": [
"repeat word from attached image",
{
"image": base64_image,
"resize": 512, # this param never worked on released models
},
],
}
]
There’s a third and fourth that only worked on gpt-4-vision-preview
, and that only worked on the internal GPT-4 “Be My Eyes” version, that are not accepted any more.
What’s more, these alternate methods would place images differently in AI context. For now, the billing mostly aligns with detail:high
.
Note, I said mostly…
Input token consumption of an image plus five-word text
128x64 image:
Model | vision-low | vision-high | alternate | token pricing/1M |
---|---|---|---|---|
o3-2025-04-16 | 86 | 236 | 230 | $10.00 |
o4-mini-2025-04-16 | 26 | 26 | 26 | $1.10 |
gpt-4.1-2025-04-14 | 97 | 267 | 241 | $2.00 |
gpt-4.1-mini-2025-04-14 | 26 | 26 | 26 | $0.40 |
gpt-4.1-nano-2025-04-14 | 33 | 33 | 33 | $0.10 |
gpt-4.5-preview-2025-02-27 | 97 | 267 | 241 | $75.00 |
o1-2024-12-17 | 22 | 44 | 38 | $15.00 |
gpt-4o-2024-11-20 | 97 | 267 | 241 | $2.50 |
gpt-4o-2024-08-06 | 97 | 267 | 241 | $2.50 |
gpt-4o-mini-2024-07-18 | 2845 | 8512 | 2989 | $0.15 |
gpt-4o-2024-05-13 | 97 | 267 | 241 | $5.00 |
gpt-4-turbo-2024-04-09 | 97 | 267 | 241 | $10.00 |
gpt-4.1-mini, nano, and o4-mini are billed as “patches”, then by a multiplier factor, which can make images smaller than 512px significantly less expensive.
Apparently, continuing the wrong documentation, o1
also is “patches”, but follows details:low and high differently on big images.
550x750 image
Model | vision-low | vision-high | alternate |
---|---|---|---|
o3-2025-04-16 | 86 | 686 | 647 |
o4-mini-2025-04-16 | 756 | 756 | 756 |
gpt-4.1-2025-04-14 | 97 | 777 | 658 |
gpt-4.1-mini-2025-04-14 | 713 | 713 | 713 |
gpt-4.1-nano-2025-04-14 | 1076 | 1076 | 1076 |
gpt-4.5-preview-2025-02-27 | 97 | 777 | 658 |
o1-2024-12-17 | 22 | 686 | 647 |
gpt-4o-2024-11-20 | 97 | 777 | 658 |
gpt-4o-2024-08-06 | 97 | 777 | 658 |
gpt-4o-mini-2024-07-18 | 2845 | 25513 | 3406 |
gpt-4o-2024-05-13 | 97 | 777 | 658 |
gpt-4-turbo-2024-04-09 | 97 | 777 | 658 |
I will leave it as an exercise for you to explore the ideosyncratic pricing and vision quality realized a bit more where you see anomaly…
(also note - only a few of these models are actually affordable and practical for vision input)