Vision token counts does not correspond to the documentation

I believe the OpenAI API no longer produces the expected results when we use the image token counting method described in the documentation. Given a 1024x1024 image:

  • According to the vision docs, the token count should be 765 with detail=high, and 85 with detail=low.
  • The cost calculator for gpt-4o agrees with the values above, but the calculator for gpt-4o-mini returns much higher values: 25501 with detail=high and 2833 with detail=low.

However, this is not the case when I send the images via the OpenAI python SDK (v1.58.1). Hereā€™s my snippet:

from openai import OpenAI
from itertools import product
import mimetypes
import base64


client = OpenAI()

responses = {}
for model, img_detail in product(
    ["gpt-4o-mini-2024-07-18", "gpt-4o-2024-08-06"], ["high", "low"]
):
    responses[(model, img_detail)] = client.chat.completions.create(
        model=model,
        max_tokens=1,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://live.staticflickr.com/7151/6760135001_58b1c5c5f0_b.jpg",
                            "detail": img_detail,
                        },
                    },
                ],
            },
        ],
    )

for (_, img_detail), model_response in responses.items():
    print(
        f"{img_detail=}, {model_response.model=}, {model_response.usage.prompt_tokens=}"
    )

And the output:

img_detail='high', model_response.model='gpt-4o-mini-2024-07-18', model_response.usage.prompt_tokens=864
img_detail='low', model_response.model='gpt-4o-mini-2024-07-18', model_response.usage.prompt_tokens=303
img_detail='high', model_response.model='gpt-4o-2024-08-06', model_response.usage.prompt_tokens=864
img_detail='low', model_response.model='gpt-4o-2024-08-06', model_response.usage.prompt_tokens=303

The values for gpt-4o seem wrong, and the values for gpt-4o-mini are now equal to those of gpt-4o for some reason.

I have the expected inflation of image costs when using mini, to double that of gpt-4o.

gpt-4o-2024-08-06: prompt usage: 111, rate usage: 794
gpt-4o-mini-2024-07-18: prompt usage: 2859, rate usage: 795

(0 images is 19 prompt_tokens)

This when sending a 512x512 image at low detail and as base64. It also shows the impact on the usage limits reported by headers. This prompt token count is also verified with your external URL. This is the value reported by streaming usage.

Programming note - clear the value of the return object chat_completion to ensure a failure doesnā€™t reuse the response object.

Solution: You might find it useful to actually change models during iteration loops instead of just using the one that is hardcoded. :face_with_raised_eyebrow:

Solution: You might find it useful to actually change models during iteration loops instead of just using the one that is hardcoded. :face_with_raised_eyebrow:

Ooops, that was a silly mistake I made when cleaning up my snippet for the forum post. I have made edits in the original post, and sadly the output stays the same. The problem is still there when I send the base64-encoded image directly. Hereā€™s my convert function:

def convert_to_data_uri(path):
    path = str(path)
    mime, _ = mimetypes.guess_type(path)
    with open(path, "rb") as f:
        encoded_content = base64.b64encode(f.read()).decode("utf-8")
    return f"data:{mime};base64,{encoded_content}"

# and then below:
...
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": convert_to_data_uri("./earth.jpg"),
                        "detail": img_detail,
                    },
                },
            ],
        },
    ],
...

gpt-4o-2024-08-06: prompt usage: 111, rate usage: 794

Isnā€™t the prompt token count supposed to be 85? And what do you mean by ā€˜rate usageā€™?

Usage tokens: The http request made to the API has headers. You can use the Python SDK with_raw_response method to get them - along with needing different parsing. It keeps track of your rate limiting. It is a dumb estimate.


What you report is indeed seen when doing a non-streaming response. Only once out of many trials did I get the large report.

API call successful for model: gpt-4o-mini-2024-07-18, image_detail: high
  Model used: gpt-4o-mini-2024-07-18
  Prompt tokens: 25513
--------------------
API call successful for model: gpt-4o-mini-2024-07-18, image_detail: low
  Model used: gpt-4o-mini-2024-07-18
  Prompt tokens: 308
--------------------
API call successful for model: gpt-4o-2024-08-06, image_detail: high
  Model used: gpt-4o-2024-08-06
  Prompt tokens: 869
--------------------
API call successful for model: gpt-4o-2024-08-06, image_detail: low
  Model used: gpt-4o-2024-08-06
  Prompt tokens: 308
--------------------

I also verified it is not the Python SDK rewriting values, by making the JSON requests directly.

Perhaps if you are in the ā€œfreeā€ discount program for training with API data, this consumption is calculated correctly for you. (Or the massive cost inflation is being retired soon).

One can also speculate that remote URL can have image processing to tokens cached, a possible discount not previously offered.

1 Like