Hi, how do I count how many number of tokens does the each image has when using gpt-4-vision-preview model?

According to the pricing page, every image is first globally described by **85** base tokens.

## Tiles

To be fully recognized, an image is covered by 512x512 tiles.

Each tile provides **170** tokens. So, by default, the formula is the following:

total tokens = **85 + 170 * n**, where n = the number of tiles needed to cover your image.

## Implementation

This can be easily computed this way:

```
from math import ceil
def count_image_tokens(width: int, height: int):
h = ceil(height / 512)
w = ceil(width / 512)
n = w * h
total = 85 + 170 * n
return total
```

or in one line if you prefer:

```
count_total_tokens = lambda w, h: 85 + 170 * ceil(w / 512) * ceil(h / 512)
```

## Some examples

- 500x500 → 1 tile is enough to cover this up, so total tokens = 85+170 = 255
- 513x500 → you need 2 tiles → total tokens = 85+170*2 = 425
- 513x513 → you need 4 tiles → total tokens = 85+170*2 = 765

`low_resolution`

mode

In *“low resolution”* mode, there is no tile; only the **85** base tokens remain, no matter the size of your image.

5 Likes