I noticed that the vision cost for the new mini model is as high as for the normal gpt-4o model.
At first i thougt the calculator on the pricing page is wrong, but after testing out the api in my nodejs application I can sadly confirm that gpt-4o-mini uses about 33x more tokens for an image while being cheaper 33 times than gpt-4o.
What is the reason behind this ? I assume gpt-4o-mini cant do vision natively so the request is handled by gpt-4o ?
Would appreciate some clarification. With vision cost that high for the mini model, I have to keep on using gemini-flash / haiku instead.
12 Likes
I got the same problem. For an image of 1280x720, GPT-4o-mini uses around 37,000 tokens, while Haiku uses 1000 tokens. Is this a bug?
3 Likes
@truong.nguyen95 without an screenshot to sustain what are you saying, this could be taken as advertising…
There’s no reason to advertise really. I’m just trying the new model because it’s cheaper than Haiku. I’m still using GPT-4o-mini for texts, but for now images are processed with Haiku.
2 Likes
Going to playground and prompting an image to gpt-4o-mini is easy to reproduce, so no need for screenshot.
Would be good to know whether this gpt-4o-mini vision pricing is permanent or cheaper version will be released soon too.
3 Likes
Can confirm, uploading an image to gpt-4o-mini somehow uses a massive amount of tokens compared to 4o.
The same image across both models: 869 tokens VS 25590 (!) tokens.
This has to be a bug, because nothing on the Pricing page or the Vision page mentions this.
Now let’s hope OpenAI sees this and fixes it quickly before people burn all their credits as a result.
EDIT: Not a bug. Vision just costs the same across both models, but since the token cost is lower in 4o-mini, it uses up more tokens to even it out.
6 Likes
I don’t think it’s a bug - it looks like it’s intentional. If you go to https://openai.com/api/pricing/ calculator and go through both of the models you can see that in pricing detail view the token amount is 33x in mini.
6 Likes
Oh nevermind, you’re absolutely right! So this means that despite 4o-mini’s lower token cost, images still cost as much to process as in 4o, resulting in much more tokens used. It evens out.
1 Like
Probably is a Bug on playground, i used the new model on the API is great, is faster, cheaper than 3.5 turbo, I feel the answers are better I love this new model.
I will do some tests in Playground too to see the usage and compare it.
This is not a bug. The token consumption for Vision on Mini is roughly 33x higher than it is for Omni.
3 Likes
Even if it is not a bug, there should be some clarification statement.
To me it seems like a simple multplication of the acutal tokens in the backend server for billing purposes ( because everything is token based ). Especially since it evens out with 33x the amount of tokens vs 33x reduction in cost. Why would a much smaller model cost the same as the bigger and more performant model for vision tasks. It just does not make sense to use the small model for vision tasks except if there are a lot of additional text tokens involved (input + output)
Maybe there is a little bit of fixed costs for processing images in general but the smaller model should still be significantly cheaper ( Gemini Flash is also 10x Cheaper on vision tasks compared to Gemini Pro)
My understanding is they are working on this clarification.
So are they misleading or manipulating the reported token usage so they can publish lower (but practically useless) token costs for vision.
1 Like
Here a statement from the open ai head of dx:
https://x.com/romainhuet/status/1814054938986885550?t=AMFK4svMvCluYqAXUqRDMQ&s=19
So it seems like it works ( costs ) as intended.
Makes it not really usable for high volume vision tasks. Funny because it is supposed to be a high volume model.
Can recommend Gemini Flash with a fixed cost of $0.0001315 per image vs $0.005525 (768px x 1128px) for gpt-4o(mini) which is around 40 times cheaper while performing great.
2 Likes
Thank you!
After hours of tests, I see some problems with other models too… Assistants do not follow Instructions like before.
So are they just artificially using more tokens to keep the costs consistent between models?
Pretty much. Processing images costs same in pure money. Text is cheaper, though.
1 Like
I definitely don’t fully understand how the pricing and tokens work for images, but does this mean the mini model can actually process images for cheaper (for the user), but they basically won’t let it?
Only OpenAI knows what they can or can’t. But for us the situation is images cost same amount of money.
2 Likes
Well, if anybody from OpenAI is listening, the processing of text images should be WAY cheaper. It doesn’t make sense to price an image of an invoice the same as an image of the Mona Lisa. How do they expect businesses to embrace this technology when they make the cost of processing it’s basic documents so prohibitive.
And yes, I know we can extract text from PDFs and images using OCR, but in many cases, and I can think of one big one, where we could really benefit by letting the model examine the documents first.