One request costs 153491 input tokens

samx1 · January 11, 2025, 11:35am

I’m making one request to gpt-4o-mini in the Playground. The request has one 838x815 image (black and white with text) and a 30-word prompt. In the Usage display I see that this one request costs 153491 input tokens. Is this correct?

phyde1001 · January 11, 2025, 12:09pm

Hi,

Welcome to the forum.

As I understand it:

Text generation works out at about 4 bytes per token

However for images tokens work based on complexity of generation for the image or for vision

There is more information on the pricing page

_j · January 11, 2025, 6:25pm

85 + (170 * 4) = 765
x33.33 token price factor = 25500 tokens on gpt-4o-mini

Doesn’t add up. Until you multiply that by 6.

Be mindful that past “chat” images also are counted again.

Any internal iterations during tool use means API context inputs for images
is counted again by context again being sent to another API model call.

_j · January 11, 2025, 6:42pm

This should be communicated clearly:

Don’t use `gpt-4o-mini` for images

It costs twice as much as using `gpt-4o`

Developers don’t get to offer a cheap vision product

The free images come with a free ChatGPT account that can prompt you to upgrade OpenAI services.

samx1 · January 12, 2025, 7:28am

That was it! It wasn’t obvious to me but indeed in the Playground UI all the previous requests are apparently sent along with the last request. The token count makes sense now after I clicked ‘Clear’, Usage shows ~25k input tokens per request.

Thanks for the replies. Given that mini is not that cheap with images, I’ll switch to another model.

jochenschultz · January 12, 2025, 11:34pm

I would think that is a bug… why would they send the image again in playground and know of that (in ChatGPT this is not the case) and leave it like that?
I mean that will also change the whole conversation - can even bring the model completely out of context… Let’s say you have an image with a duck and a dog and they are exactly the same size and you prompt something like which of the animals needs more space? … then sometimes it could refer to “space inside the image” or “space inside the real world” and then in next prompt you have a calculation based on that outcome… and then in the end you will add the last image and ask for the final result and it will be completely different…

And it is not uncommon in the USA to measure stuff with objects… like “this building is 7453 hot dogs tall”…

jochenschultz · January 12, 2025, 11:40pm

ok, the single interactions are not recreated… of course so maybe i was thinking wrongly here… but still there should be some sort of conflict in the result when adding the images instead of the result of the first recognition.

_j · January 13, 2025, 12:04am

A conversation history is just that: A record of what has happened before.

Imagine this:

User: what’s in this image {image}
assistant: a dog
User: what’s in this image {image}
assistant: a cat
User: what do these two images have in common?
assistant: I’m sorry, I did not receive any images.

That’s you with no conversation history.

With conversation history, the first user message cost one image, the second one cost two images, and the third also had two images sent to the API.

The chat playground allows one to press “submit” without the basic understanding that everything seen is sent again.

jochenschultz · January 13, 2025, 12:10am

assistant: I’m sorry, I did not receive any images. << this response is kind of strange - since from the conversation history it knows it had received two images and it could compare the two inferencences…
the “I’m sorry, I did not receive any images.” looks more like a huge overfit or even something hardcoded in the models code.

_j · January 13, 2025, 12:16am

I demonstrate why a conversation history is necessary, and why if you are creating a API chatbot to which the user can provide images, why you would need to continue sending images back for future API calls. Who knows when the user might refer back to a past image?

The AI model won’t be all that confused about pictures of dogs and cats in the past when you then proceed to have it continue with other questions or other images.

This topic is about the cost - why they were billed for six images when interacting in the playground. Likely because of a growing chat that was sending more past images back.

Topic		Replies	Views
Unexpectedly High Token Count When Using Image Inputs with gpt-4o-mini API	3	352	April 1, 2025
Consuming more tokens than expected for image - Vision - gpt-4o Bugs	12	849	March 9, 2025
Strange/Bad behavior of Open AI API with vision models API gpt-4 , api	7	462	February 24, 2025
Responses API Image Generation Token Usage API api-usage , gpt-image-1 , responses-api	1	235	June 4, 2025
How Much Does OpenAI Image Editing Cost Compared to New Image Generation? API image-generation	13	736	June 1, 2025

One request costs 153491 input tokens

Don’t use gpt-4o-mini for images

It costs twice as much as using gpt-4o

Developers don’t get to offer a cheap vision product

Related topics

Don’t use `gpt-4o-mini` for images

It costs twice as much as using `gpt-4o`