Vision: More than one way to send a request, undocumented: and different pricing realized

First plug: a web page to make vision pricing clearer.

  • Add an image, by either upload or URL, or simply simulating the dimensions. Add multiple images;
  • Select the (practical) model that you want to see the price calculated for.

It just can’t be 100% though, we’ll find out…

Here’s how the API Reference would have you send images

user_doc = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "repeat word from attached image"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/png;base64,{base64_image}",
                    "detail": "auto"
                }
            },
        ],
    }
]

But - did you know there’s another way?

  • when you are sending a base64 image
user_alt1 = [
    {
        "role": "user",
        "content": [
            "repeat word from attached image",
            {
                "image": base64_image,
                "resize": 512,  # this param never worked on released models
            },
        ],
    }
]

There’s a third and fourth that only worked on gpt-4-vision-preview, and that only worked on the internal GPT-4 “Be My Eyes” version, that are not accepted any more.


What’s more, these alternate methods would place images differently in AI context. For now, the billing mostly aligns with detail:high.

Note, I said mostly…

Input token consumption of an image plus five-word text

128x64 image:

Model vision-low vision-high alternate token pricing/1M
o3-2025-04-16 86 236 230 $10.00
o4-mini-2025-04-16 26 26 26 $1.10
gpt-4.1-2025-04-14 97 267 241 $2.00
gpt-4.1-mini-2025-04-14 26 26 26 $0.40
gpt-4.1-nano-2025-04-14 33 33 33 $0.10
gpt-4.5-preview-2025-02-27 97 267 241 $75.00
o1-2024-12-17 22 44 38 $15.00
gpt-4o-2024-11-20 97 267 241 $2.50
gpt-4o-2024-08-06 97 267 241 $2.50
gpt-4o-mini-2024-07-18 2845 8512 2989 $0.15
gpt-4o-2024-05-13 97 267 241 $5.00
gpt-4-turbo-2024-04-09 97 267 241 $10.00

gpt-4.1-mini, nano, and o4-mini are billed as “patches”, then by a multiplier factor, which can make images smaller than 512px significantly less expensive.

Apparently, continuing the wrong documentation, o1 also is “patches”, but follows details:low and high differently on big images.

550x750 image

Model vision-low vision-high alternate
o3-2025-04-16 86 686 647
o4-mini-2025-04-16 756 756 756
gpt-4.1-2025-04-14 97 777 658
gpt-4.1-mini-2025-04-14 713 713 713
gpt-4.1-nano-2025-04-14 1076 1076 1076
gpt-4.5-preview-2025-02-27 97 777 658
o1-2024-12-17 22 686 647
gpt-4o-2024-11-20 97 777 658
gpt-4o-2024-08-06 97 777 658
gpt-4o-mini-2024-07-18 2845 25513 3406
gpt-4o-2024-05-13 97 777 658
gpt-4-turbo-2024-04-09 97 777 658

I will leave it as an exercise for you to explore the ideosyncratic pricing and vision quality realized a bit more where you see anomaly… :laughing:

(also note - only a few of these models are actually affordable and practical for vision input)

1 Like

So is this saying that using the alt method, you get a massive discount but no loss of quality?

I was only concerned about showing the consumption. The evaluation I leave to you.

How about: documenting that use of gpt-4o-mini tokens - which was on a challenging undersized image from a forum user? top_p: 0.001 should demonstrate a difference in the perception.

“low”

[0]: I’m unable to transcribe text from images directly. However, if you can provide the text in a different format, I’d be happy to help with that!
gpt-4o-mini-2024-07-18 vision token usage: 2849

“high”

[0]: In 1957 s-a constituit guvernul reconcilierii naționale, sub conducerea prințului Suvarna Phouma, la care a participat și Partidul Nou Hatik. În 1960, guvernul de dreapta a organizat o rebeliune militară, iar în 1961, lupta civilă a dus necesar reformelor legislative prin legea de reglementare a problemelor legate (1961-1962).

Luang-Prabang și Cean-Nir (de la sfârșitul sec. al XVIII-lea, de baza Siamului). În 1893, Franța a tratat teritoriile cuprinse în Indochina, sub denumirea de Indochina Franceză.

În 1945, în timpul ocupării japoneze, s-a proclamat independența, dar a fost repede anulată.

Laotienii au fost conduși de un guvern procomunist, iar în 1962 s-a realizat un guvern de coaliție.

Laoțienii (sec. VI e.n.) scriau lucrări „Dao de jing”, scrise în sec. IV-III î.e.n.

Conceptul unei mari unități a fost redat.
gpt-4o-mini-2024-07-18 vision token usage: 25517

“alternate”

[0]: In 1957 s-a constituit guvernul reconcilierii naționale, sub conducerea prințului Suvarna Phouma, la care a participat și Partidul Nou Hakka. În 1960, guvernul a declanșat un program de reabilitare a economiei, dar conflictele interne au dus la o instabilitate politică. Luang-Prabang și Caan-Nir (de la sfârșitul sec. al XVIII-lea, de la baza Siamului) în 1893, au intrat în teritoriile Siamului. Franța a tratat teritoriile ca fiind independente, sub denumirea de Laos. În 1945, după război, s-a proclamat independența, dar provinciile au rămas sub influența franceză. În 1962, s-a constituit un guvern de coaliție.

Laoții (sec. VI i.e.n.) scria în lucrarea „Dao de Jing”, care este considerată una dintre cele mai vechi lucrări filosofice.
gpt-4o-mini-2024-07-18 vision token usage: 3410

what is being extracted from is apparently:

None are actually satisfactory. The whole image sent showing below not transcribed, and a high word-error-rate and sentence skipping, indeed makes this model inappropriate for this OCR task, but the token output is the same for long enough to make one think the input for “high” and “alt” is the same.

You can try instead on “what’s in this image” tasks, instead of:

user: Produce a transcription of text, ignoring images.

1 Like