I’m making one request to gpt-4o-mini in the Playground. The request has one 838x815 image (black and white with text) and a 30-word prompt. In the Usage display I see that this one request costs 153491 input tokens. Is this correct?
Hi,
Welcome to the forum.
As I understand it:
Text generation works out at about 4 bytes per token
However for images tokens work based on complexity of generation for the image or for vision
There is more information on the pricing page
85 + (170 * 4) = 765
x33.33 token price factor = 25500
tokens on gpt-4o-mini
Doesn’t add up. Until you multiply that by 6
.
Be mindful that past “chat” images also are counted again.
Any internal iterations during tool use means API context inputs for images
is counted again by context again being sent to another API model call.
This should be communicated clearly:
Don’t use gpt-4o-mini
for images
It costs twice as much as using gpt-4o
Developers don’t get to offer a cheap vision product
- The free images come with a free ChatGPT account that can prompt you to upgrade OpenAI services.
That was it! It wasn’t obvious to me but indeed in the Playground UI all the previous requests are apparently sent along with the last request. The token count makes sense now after I clicked ‘Clear’, Usage shows ~25k input tokens per request.
Thanks for the replies. Given that mini is not that cheap with images, I’ll switch to another model.
I would think that is a bug… why would they send the image again in playground and know of that (in ChatGPT this is not the case) and leave it like that?
I mean that will also change the whole conversation - can even bring the model completely out of context… Let’s say you have an image with a duck and a dog and they are exactly the same size and you prompt something like which of the animals needs more space? … then sometimes it could refer to “space inside the image” or “space inside the real world” and then in next prompt you have a calculation based on that outcome… and then in the end you will add the last image and ask for the final result and it will be completely different…
And it is not uncommon in the USA to measure stuff with objects… like “this building is 7453 hot dogs tall”…
ok, the single interactions are not recreated… of course so maybe i was thinking wrongly here… but still there should be some sort of conflict in the result when adding the images instead of the result of the first recognition.
A conversation history is just that: A record of what has happened before.
Imagine this:
User: what’s in this image {image}
assistant: a dog
User: what’s in this image {image}
assistant: a cat
User: what do these two images have in common?
assistant: I’m sorry, I did not receive any images.
That’s you with no conversation history.
With conversation history, the first user message cost one image, the second one cost two images, and the third also had two images sent to the API.
The chat playground allows one to press “submit” without the basic understanding that everything seen is sent again.
assistant: I’m sorry, I did not receive any images. << this response is kind of strange - since from the conversation history it knows it had received two images and it could compare the two inferencences…
the “I’m sorry, I did not receive any images.” looks more like a huge overfit or even something hardcoded in the models code.
I demonstrate why a conversation history is necessary, and why if you are creating a API chatbot to which the user can provide images, why you would need to continue sending images back for future API calls. Who knows when the user might refer back to a past image?
The AI model won’t be all that confused about pictures of dogs and cats in the past when you then proceed to have it continue with other questions or other images.
This topic is about the cost - why they were billed for six images when interacting in the playground. Likely because of a growing chat that was sending more past images back.