What is the token cost for image prompt in GPT-4o?

I am testing the capability of GPT-4o (Azure OpenAI) on image captioning by using the Python API. According to the OpenAI guide here, the token needed to process image with "detail": "low" should take 85 of tokens. However, my experiment shows that only 73 tokens are being used. Am I missing something here?

This is part of the output from the GPT-4o response. The prompt_tokens shows that only 73 of tokens are consumed (I only sent the image with no other prompt).

ChatCompletion(..., model='gpt-4o-2024-05-13', ..., usage=CompletionUsage(completion_tokens=97, prompt_tokens=73, total_tokens=170),...)

Yes, GPT-4o has a different token encoder, meaning that the same information can be encoded differently.

How exactly images are tokenized or embedded for understanding at all is OpenAI’s secret, with one “tile” not coming out to an even number, so you can just roll with “the consumption is lower and a bit unpredictable”.

1 Like

That explained everything. Thanks!