Hi everyone,
I came across something odd when testing the image input capabilities via the api. According to the OpenAI documentation, image token usage should be reasonable and proportional to the image processing effort. However, when I ran the example curl
command from the docs (with two identical images), I got a total token usage of 36,912 tokens.
Here is the exact request I used (copied from the docs):
curl https://api.openai.com/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_KEY" \
-d '{
"model": "gpt-4o-mini",
"input": [
{
"role": "user",
"content": [
{"type": "input_text", "text": "what is in this image?"},
{
"type": "input_image",
"image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
]
}
]
}'
And here’s the relevant part of the response:
"usage": {
"input_tokens": 36848,
"output_tokens": 64,
"total_tokens": 36912
}
The output from the model was a short and simple description of the image, so the output_tokens value makes sense. But nearly 37k input tokens for an image with 2560 × 1669 px seems excessive.
Is this expected behavior? Could it be a bug or miscalculation in the token estimation for image inputs?
Would love to hear if others are seeing the same.
Thanks!