GPT-4o-mini high vision cost

markeem · July 19, 2024, 8:46am

I noticed that the vision cost for the new mini model is as high as for the normal gpt-4o model.

At first i thougt the calculator on the pricing page is wrong, but after testing out the api in my nodejs application I can sadly confirm that gpt-4o-mini uses about 33x more tokens for an image while being cheaper 33 times than gpt-4o.

What is the reason behind this ? I assume gpt-4o-mini cant do vision natively so the request is handled by gpt-4o ?

Would appreciate some clarification. With vision cost that high for the mini model, I have to keep on using gemini-flash / haiku instead.

truong.nguyen95 · July 19, 2024, 9:05am

I got the same problem. For an image of 1280x720, GPT-4o-mini uses around 37,000 tokens, while Haiku uses 1000 tokens. Is this a bug?

razvan.i.savin · July 19, 2024, 9:55am

@truong.nguyen95 without an screenshot to sustain what are you saying, this could be taken as advertising…

truong.nguyen95 · July 19, 2024, 10:03am

There’s no reason to advertise really. I’m just trying the new model because it’s cheaper than Haiku. I’m still using GPT-4o-mini for texts, but for now images are processed with Haiku.

gustas.dersonas · July 19, 2024, 10:39am

Going to playground and prompting an image to gpt-4o-mini is easy to reproduce, so no need for screenshot.

Would be good to know whether this gpt-4o-mini vision pricing is permanent or cheaper version will be released soon too.

turbolucius · July 19, 2024, 11:17am

~~Can confirm, uploading an image to gpt-4o-mini somehow uses a massive amount of tokens compared to 4o.~~

~~The same image across both models: 869 tokens VS 25590 (!) tokens.~~

~~This has to be a bug, because nothing on the Pricing page or the Vision page mentions this.~~

~~Now let’s hope OpenAI sees this and fixes it quickly before people burn all their credits as a result.~~

EDIT: Not a bug. Vision just costs the same across both models, but since the token cost is lower in 4o-mini, it uses up more tokens to even it out.

gustas.dersonas · July 19, 2024, 11:26am

I don’t think it’s a bug - it looks like it’s intentional. If you go to https://openai.com/api/pricing/ calculator and go through both of the models you can see that in pricing detail view the token amount is 33x in mini.

turbolucius · July 19, 2024, 11:30am

Oh nevermind, you’re absolutely right! So this means that despite 4o-mini’s lower token cost, images still cost as much to process as in 4o, resulting in much more tokens used. It evens out.

razvan.i.savin · July 19, 2024, 12:05pm

Probably is a Bug on playground, i used the new model on the API is great, is faster, cheaper than 3.5 turbo, I feel the answers are better I love this new model.
I will do some tests in Playground too to see the usage and compare it.

anon22939549 · July 19, 2024, 2:22pm

This is not a bug. The token consumption for Vision on Mini is roughly 33x higher than it is for Omni.

markeem · July 19, 2024, 6:07pm

Even if it is not a bug, there should be some clarification statement.

To me it seems like a simple multplication of the acutal tokens in the backend server for billing purposes ( because everything is token based ). Especially since it evens out with 33x the amount of tokens vs 33x reduction in cost. Why would a much smaller model cost the same as the bigger and more performant model for vision tasks. It just does not make sense to use the small model for vision tasks except if there are a lot of additional text tokens involved (input + output)

Maybe there is a little bit of fixed costs for processing images in general but the smaller model should still be significantly cheaper ( Gemini Flash is also 10x Cheaper on vision tasks compared to Gemini Pro)

anon22939549 · July 19, 2024, 6:15pm

My understanding is they are working on this clarification.

kevin.dragan · July 19, 2024, 6:28pm

So are they misleading or manipulating the reported token usage so they can publish lower (but practically useless) token costs for vision.

markeem · July 19, 2024, 8:06pm

Here a statement from the open ai head of dx:

https://x.com/romainhuet/status/1814054938986885550?t=AMFK4svMvCluYqAXUqRDMQ&s=19

So it seems like it works ( costs ) as intended.

Makes it not really usable for high volume vision tasks. Funny because it is supposed to be a high volume model.

Can recommend Gemini Flash with a fixed cost of $0.0001315 per image vs $0.005525 (768px x 1128px) for gpt-4o(mini) which is around 40 times cheaper while performing great.

razvan.i.savin · July 19, 2024, 8:52pm

Thank you!
After hours of tests, I see some problems with other models too… Assistants do not follow Instructions like before.

n_rose21 · July 19, 2024, 10:25pm

So are they just artificially using more tokens to keep the costs consistent between models?

jakke.lehtonen · July 19, 2024, 10:58pm

Pretty much. Processing images costs same in pure money. Text is cheaper, though.

n_rose21 · July 19, 2024, 11:17pm

I definitely don’t fully understand how the pricing and tokens work for images, but does this mean the mini model can actually process images for cheaper (for the user), but they basically won’t let it?

jakke.lehtonen · July 20, 2024, 3:28am

Only OpenAI knows what they can or can’t. But for us the situation is images cost same amount of money.

SomebodySysop · July 20, 2024, 7:12am

Well, if anybody from OpenAI is listening, the processing of text images should be WAY cheaper. It doesn’t make sense to price an image of an invoice the same as an image of the Mona Lisa. How do they expect businesses to embrace this technology when they make the cost of processing it’s basic documents so prohibitive.

And yes, I know we can extract text from PDFs and images using OCR, but in many cases, and I can think of one big one, where we could really benefit by letting the model examine the documents first.

Topic		Replies	Views
Super-high token usage with gpt-4o-mini and image API playground , gpt-4o-mini	10	12854	August 22, 2024
Token Usage for Images Remains Constant Regardless of Size - Is This a Bug? API	8	28996	September 16, 2025
Help understand token usage with vision API API gpt-4-vision	7	3972	February 12, 2025
I'm burning through tokens here. What can I do to minimize that? I've included the text of my instructions to my Assistant API	11	2264	November 14, 2023
Gpt-4o-mini consumes more than 20 times tokens for images than gpt-4 API gpt-4o-mini	4	5140	August 20, 2024

GPT-4o-mini high vision cost

Related topics