Vision: More than one way to send a request, undocumented: and different pricing realized

_j · May 16, 2025, 10:56am

First plug: a web page to make vision pricing clearer.

Add an image, by either upload or URL, or simply simulating the dimensions. Add multiple images;
Select the (practical) model that you want to see the price calculated for.

It just can’t be 100% though, we’ll find out…

Here’s how the API Reference would have you send images

user_doc = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "repeat word from attached image"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/png;base64,{base64_image}",
                    "detail": "auto"
                }
            },
        ],
    }
]

But - did you know there’s another way?

when you are sending a base64 image

user_alt1 = [
    {
        "role": "user",
        "content": [
            "repeat word from attached image",
            {
                "image": base64_image,
                "resize": 512,  # this param never worked on released models
            },
        ],
    }
]

There’s a third and fourth that only worked on gpt-4-vision-preview, and that only worked on the internal GPT-4 “Be My Eyes” version, that are not accepted any more.

What’s more, these alternate methods would place images differently in AI context. For now, the billing mostly aligns with detail:high.

Note, I said mostly…

Input token consumption of an image plus five-word text

128x64 image:

Model	vision-low	vision-high	alternate	token pricing/1M
o3-2025-04-16	86	236	230	$10.00
o4-mini-2025-04-16	26	26	26	$1.10
gpt-4.1-2025-04-14	97	267	241	$2.00
gpt-4.1-mini-2025-04-14	26	26	26	$0.40
gpt-4.1-nano-2025-04-14	33	33	33	$0.10
gpt-4.5-preview-2025-02-27	97	267	241	$75.00
o1-2024-12-17	22	44	38	$15.00
gpt-4o-2024-11-20	97	267	241	$2.50
gpt-4o-2024-08-06	97	267	241	$2.50
gpt-4o-mini-2024-07-18	2845	8512	2989	$0.15
gpt-4o-2024-05-13	97	267	241	$5.00
gpt-4-turbo-2024-04-09	97	267	241	$10.00

gpt-4.1-mini, nano, and o4-mini are billed as “patches”, then by a multiplier factor, which can make images smaller than 512px significantly less expensive.

Apparently, continuing the wrong documentation, o1 also is “patches”, but follows details:low and high differently on big images.

550x750 image

Model	vision-low	vision-high	alternate
o3-2025-04-16	86	686	647
o4-mini-2025-04-16	756	756	756
gpt-4.1-2025-04-14	97	777	658
gpt-4.1-mini-2025-04-14	713	713	713
gpt-4.1-nano-2025-04-14	1076	1076	1076
gpt-4.5-preview-2025-02-27	97	777	658
o1-2024-12-17	22	686	647
gpt-4o-2024-11-20	97	777	658
gpt-4o-2024-08-06	97	777	658
gpt-4o-mini-2024-07-18	2845	25513	3406
gpt-4o-2024-05-13	97	777	658
gpt-4-turbo-2024-04-09	97	777	658

I will leave it as an exercise for you to explore the ideosyncratic pricing and vision quality realized a bit more where you see anomaly…

(also note - only a few of these models are actually affordable and practical for vision input)

merefield · May 16, 2025, 5:59pm

So is this saying that using the alt method, you get a massive discount but no loss of quality?

_j · May 16, 2025, 6:30pm

I was only concerned about showing the consumption. The evaluation I leave to you.

How about: documenting that use of gpt-4o-mini tokens - which was on a challenging undersized image from a forum user? top_p: 0.001 should demonstrate a difference in the perception.

“low”

[0]: I’m unable to transcribe text from images directly. However, if you can provide the text in a different format, I’d be happy to help with that!
gpt-4o-mini-2024-07-18 vision token usage: 2849

“high”

[0]: In 1957 s-a constituit guvernul reconcilierii naționale, sub conducerea prințului Suvarna Phouma, la care a participat și Partidul Nou Hatik. În 1960, guvernul de dreapta a organizat o rebeliune militară, iar în 1961, lupta civilă a dus necesar reformelor legislative prin legea de reglementare a problemelor legate (1961-1962).

Luang-Prabang și Cean-Nir (de la sfârșitul sec. al XVIII-lea, de baza Siamului). În 1893, Franța a tratat teritoriile cuprinse în Indochina, sub denumirea de Indochina Franceză.

În 1945, în timpul ocupării japoneze, s-a proclamat independența, dar a fost repede anulată.

Laotienii au fost conduși de un guvern procomunist, iar în 1962 s-a realizat un guvern de coaliție.

Laoțienii (sec. VI e.n.) scriau lucrări „Dao de jing”, scrise în sec. IV-III î.e.n.

Conceptul unei mari unități a fost redat.
gpt-4o-mini-2024-07-18 vision token usage: 25517

“alternate”

[0]: In 1957 s-a constituit guvernul reconcilierii naționale, sub conducerea prințului Suvarna Phouma, la care a participat și Partidul Nou Hakka. În 1960, guvernul a declanșat un program de reabilitare a economiei, dar conflictele interne au dus la o instabilitate politică. Luang-Prabang și Caan-Nir (de la sfârșitul sec. al XVIII-lea, de la baza Siamului) în 1893, au intrat în teritoriile Siamului. Franța a tratat teritoriile ca fiind independente, sub denumirea de Laos. În 1945, după război, s-a proclamat independența, dar provinciile au rămas sub influența franceză. În 1962, s-a constituit un guvern de coaliție.

Laoții (sec. VI i.e.n.) scria în lucrarea „Dao de Jing”, care este considerată una dintre cele mai vechi lucrări filosofice.
gpt-4o-mini-2024-07-18 vision token usage: 3410

what is being extracted from is apparently:

None are actually satisfactory. The whole image sent showing below not transcribed, and a high word-error-rate and sentence skipping, indeed makes this model inappropriate for this OCR task, but the token output is the same for long enough to make one think the input for “high” and “alt” is the same.

You can try instead on “what’s in this image” tasks, instead of:

user: Produce a transcription of text, ignoring images.

Topic		Replies	Views
Help understand token usage with vision API API gpt-4-vision	7	2144	February 12, 2025
GPT-4.1 vision price calculations -- incorrect billing on full model Bugs bug , gpt-4-vision , gpt-41	7	370	April 24, 2025
Strange/Bad behavior of Open AI API with vision models API gpt-4 , api	7	371	February 24, 2025
Consuming more tokens than expected for image - Vision - gpt-4o Bugs	12	562	March 9, 2025
Token Usage for Images Remains Constant Regardless of Size - Is This a Bug? API	6	3057	September 23, 2024