Cost of Vision using GPT-4o

I’m trying to calculate the cost per image processed using Vision with GPT-4o. I’m passing a series of jpg files as content in low detail:

history = []
num_prompt_tokens = 0
num_completion_tokens = 0
num_total_tokens = 0

for filename, file_content in file_contents.items():
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                      "type": "image_url",
                      "image_url": {
                        "url": f"data:image/jpeg;base64,{file_content}",
                        "detail": "low"
                      },
                    },
                ],
            }
        ],
        max_tokens=256,
    )

    num_prompt_tokens += response.usage.prompt_tokens
    num_completion_tokens += response.usage.completion_tokens
    num_total_tokens += response.usage.total_tokens

    history.append(response.choices[0].message.content)

In the pricing information (https://platform.openai.com/docs/guides/vision), it says all low detail images cost 85 tokens. However, in response.usage.prompt_tokens, it racks up 12077 tokens total. The user prompt is only 156 input, so would be at most 13x156=2028. Where are the extra prompt tokens coming from; am I safe to use the (13x85)+(13x156)=3133 as total input tokens? Which one is the correct amount to base costs off?

It appears gpt-4o uses a different method to calculate input image tokens. The 170 * tiles + 85 equation doesn’t match test results.

I tested different image sizes using a single user message. Tokens used by image is prompt_tokens - 7, where 7 is the fixed message overhead.

{
  "model": "gpt-4o-2024-05-13",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,'"${imageb64}"'",
            "detail": "high"
          }
        }
      ]
    }
  ],
  "max_tokens": 20
}

The result is as follows

I haven’t figured out the new equation yet.

1 Like