Conflicting Info About the Cost of detail:low images

Hi,
In the following document, there is a conflicting info about the cost of detail:low images. In one place it says 65 tokens, in some other place it says 85 tokens.

https://platform.openai.com/docs/guides/vision

2 Likes

Yes, I don’t know where they got that “65” from. Perhaps mis-remembering.

The minimum tokens for an image is 85. The overhead of the first message is 7 tokens.

Tiny request with some images to ask about (but we don’t ask, just upload) = 92 prompt tokens.

from openai import OpenAI
client = OpenAI()
example_images = [
'iVBORw0KGgoAAAANSUhEUgAAAIAAAABACAMAAADlCI9NAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAAZQTFRF////MzMzOFSMkQAAAPJJREFUeNrslm0PwjAIhHv//09rYqZADzOBqMnu+WLTruOGvK0lhBBCCPHH4E7x3pwAfFE4tX9lAUBVwZyAYjwFAeikgH3XYxn88nzKbIZly4/BluUlIG66RVXBcYd9TTQWN+1vWUEqIJQI5nqYP6scl84UqUtEoLNMjoqBzFYrt+IF1FOTfGsqIIlcgAbNZ0Uoxtu6igB+tyBgZhCgAZ8KyI46zYQF/LksQC0L3gigdQBhgGkXou1hF1XebKzKXBxaDsjCOu1Q/LA1U+Joelt/9d2QVm9MjmibO2mGTEy2ZyetsbdLgAQIIYQQQoifcRNgAIfGAzQQHmwIAAAAAElFTkSuQmCC',
'iVBORw0KGgoAAAANSUhEUgAAAIAAAABACAMAAADlCI9NAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAAZQTFRF////AAAAVcLTfgAAAPRJREFUeNrsllEKwzAMQ+37X3owBm0c2VZCIYXpfXVBTd9qx5uZEEIIIcQr8IHjAgcc/LTBGwSiz5sEoIwTKwuxVCAW5XsxFco3Y63A3BawVWDMiFgiMD5tvELNuh/r5sA9Nu1yiYaXvBBLBawUAGubsZU5UOy8HkNvINoAv27nMVZ1WC1wfwrspPk2FDMiVpYknNu6uIxAVWQsgBoSCCQxI2KEANFdXccXseZzuKMQQDFmt6pPwU9CL+CcADEJr6qFA1aWYIgZEesGEVgmTsGvfYyIdaPYwp6JwBRL5kD4Hs7+VWGSz8aEEEIIIYQQ/8VHgAEAxPsD+SYeZ2QAAAAASUVORK5CYII='
]
user_tiled_image_message = [
  {
    "role": "user",
    "content": [
      {
        "type": "image_url",
        "image_url": {"url": f"data:image/png;base64,{example_images[0]}", "detail": "low"}
      },
    ]
  }
]

response = client.chat.completions.with_raw_response.create(
    model="gpt-4-vision-preview",
    messages=user_tiled_image_message,
    max_tokens=10, top_p=1e-19, temperature=1e-29,
)

#print(response.http_request.content.decode())   #"request" object
print(response.http_response.content.decode())  #"response" object
print(response.elapsed.total_seconds())

A response with image description when only sending an image:
{“id”: “chatcmpl-123456789”, “object”: “chat.completion”, “created”: 1709306528, “model”: “gpt-4-1106-vision-preview”, “usage”: {“prompt_tokens”: 92, “completion_tokens”: 10, “total_tokens”: 102}, “choices”: [{“message”: {“role”: “assistant”, “content”: “The image displays the word "Apple" in a”}, “finish_reason”: “length”, “index”: 0}]}

Another array method also has the same minimum per image.

1 Like

you missing the flag for low res processing.

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4-vision-preview",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
            "detail": "high"
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0].message.content)
  • low will enable the “low res” mode. The model will receive a low-res 512px x 512px version of the image, and represent the image with a budget of 65 tokens. This allows the API to return faster responses and consume fewer input tokens for use cases that do not require high detail.
  • high will enable “high res” mode, which first allows the model to see the low res image and then creates detailed crops of input images as 512px squares based on the input image size. Each of the detailed crops uses twice the token budget (65 tokens) for a total of 129 tokens.
1 Like

I’m not missing the “detail” parameter. Scroll right in the code box.
You posted the documentation that is in error, about “65 tokens”.

You can still feel free to try to get an image described for under 90 prompt tokens by whatever mechanism you think could do that, though…

ah my bad. I totally did not scroll to the end and though it was missing below haha. thanks for pointing that out.

1 Like