Vision model rate limit documentation for usage estimate is wrong

currently:

How do rate limits for GPT-4 with Vision work?

We process images at the token level, so each image we process counts towards your tokens per minute (TPM) limit. See the calculating costs section for details on the formula used to determine token count per image.

In fact, rate limit for both text tokens and image tokens is an estimate.

For images, it is a very poor estimate: no estimate at all.

A fixed impact to the rate limit is used, solely depending on whether detail “high” or “low” is specified. Image contents are not inspected, even when sent as base64.

It seems currently for either high or low, around 800 tokens are figured as the usage. Despite a low image that consumes 75-85. Or high approaching 2000.

Source: x-ratelimit-remaining-tokens


Logging, where “estimated tokens” is my client-side token counting of multi-modal messages, “prompt usage” is the API returned usage (streaming), and “rate usage” is the impact on the TPM reported by headers. Images are generated algorithmically and sent in base64:

gpt-4o-2024-08-06, detail:high

Size: 400x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 400x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 800x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 800x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 1200x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 1200x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 1600x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 1600x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 400x400, Images: 1, Prompt usage: 307, Estimated tokens: 307, Rate usage: 843
Size: 400x800, Images: 1, Prompt usage: 477, Estimated tokens: 477, Rate usage: 843
Size: 800x400, Images: 1, Prompt usage: 477, Estimated tokens: 477, Rate usage: 843
Size: 800x800, Images: 1, Prompt usage: 817, Estimated tokens: 817, Rate usage: 843
Size: 1200x400, Images: 1, Prompt usage: 647, Estimated tokens: 647, Rate usage: 843
Size: 1200x800, Images: 1, Prompt usage: 1157, Estimated tokens: 1157, Rate usage: 843
Size: 1600x400, Images: 1, Prompt usage: 817, Estimated tokens: 817, Rate usage: 843
Size: 1600x800, Images: 1, Prompt usage: 1157, Estimated tokens: 1157, Rate usage: 843
Size: 400x400, Images: 10, Prompt usage: 2674, Estimated tokens: 2674, Rate usage: 7788
Size: 400x800, Images: 10, Prompt usage: 4374, Estimated tokens: 4374, Rate usage: 7788
Size: 800x400, Images: 10, Prompt usage: 4374, Estimated tokens: 4374, Rate usage: 7788
Size: 800x800, Images: 10, Prompt usage: 7774, Estimated tokens: 7774, Rate usage: 7788
Size: 1200x400, Images: 10, Prompt usage: 6074, Estimated tokens: 6074, Rate usage: 7788
Size: 1200x800, Images: 10, Prompt usage: 11174, Estimated tokens: 11174, Rate usage: 7788
Size: 1600x400, Images: 10, Prompt usage: 7774, Estimated tokens: 7774, Rate usage: 7788
Size: 1600x800, Images: 10, Prompt usage: 11174, Estimated tokens: 11174, Rate usage: 7788

gpt-4o-2024-08-06; detail:low

Size: 400x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 400x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 800x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 800x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 1200x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 1200x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 1600x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 1600x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 400x400, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 400x800, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 800x400, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 800x800, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 1200x400, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 1200x800, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 1600x400, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 1600x800, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 400x400, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720
Size: 400x800, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720
Size: 800x400, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720
Size: 800x800, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720
Size: 1200x400, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720
Size: 1200x800, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720
Size: 1600x400, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720
Size: 1600x800, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720

gpt-4o-mini-2024-07-18; detail: high

Size: 400x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 400x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 800x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 800x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 1200x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 1200x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 1600x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 1600x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 400x400, Images: 1, Prompt usage: 8545, Estimated tokens: 300, Rate usage: 835
Size: 400x800, Images: 1, Prompt usage: 14212, Estimated tokens: 470, Rate usage: 835
Size: 800x400, Images: 1, Prompt usage: 14212, Estimated tokens: 470, Rate usage: 835
Size: 800x800, Images: 1, Prompt usage: 25546, Estimated tokens: 810, Rate usage: 835
Size: 1200x400, Images: 1, Prompt usage: 19879, Estimated tokens: 640, Rate usage: 835
Size: 1200x800, Images: 1, Prompt usage: 36880, Estimated tokens: 1150, Rate usage: 835
Size: 1600x400, Images: 1, Prompt usage: 25546, Estimated tokens: 810, Rate usage: 835
Size: 1600x800, Images: 1, Prompt usage: 36880, Estimated tokens: 1150, Rate usage: 835
Size: 400x400, Images: 10, Prompt usage: 85045, Estimated tokens: 2595, Rate usage: 7721
Size: 400x800, Images: 10, Prompt usage: 141715, Estimated tokens: 4295, Rate usage: 7720
Size: 800x400, Images: 10, Prompt usage: 141715, Estimated tokens: 4295, Rate usage: 7721
Size: 800x800, Images: 10, Prompt usage: 255055, Estimated tokens: 7695, Rate usage: 7720
Size: 1200x400, Images: 10, Prompt usage: 198385, Estimated tokens: 5995, Rate usage: 7721
Size: 1200x800, Images: 10, Prompt usage: 368395, Estimated tokens: 11095, Rate usage: 7720

Detail:low used to give a unique lower overestimate. This would be pretty obvious to set at 85 tokens in the rate limiter, so that images such as a video stream aren’t blocked at 1/10th the rate.

2 Likes