currently:
How do rate limits for GPT-4 with Vision work?
We process images at the token level, so each image we process counts towards your tokens per minute (TPM) limit. See the calculating costs section for details on the formula used to determine token count per image.
In fact, rate limit for both text tokens and image tokens is an estimate.
For images, it is a very poor estimate: no estimate at all.
A fixed impact to the rate limit is used, solely depending on whether detail “high” or “low” is specified. Image contents are not inspected, even when sent as base64.
It seems currently for either high or low, around 800 tokens are figured as the usage. Despite a low image that consumes 75-85. Or high approaching 2000.
Source: x-ratelimit-remaining-tokens
Logging, where “estimated tokens” is my client-side token counting of multi-modal messages
, “prompt usage” is the API returned usage (streaming), and “rate usage” is the impact on the TPM reported by headers. Images are generated algorithmically and sent in base64:
gpt-4o-2024-08-06, detail:high
Size: 400x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 400x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 800x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 800x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 1200x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 1200x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 1600x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 1600x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 400x400, Images: 1, Prompt usage: 307, Estimated tokens: 307, Rate usage: 843
Size: 400x800, Images: 1, Prompt usage: 477, Estimated tokens: 477, Rate usage: 843
Size: 800x400, Images: 1, Prompt usage: 477, Estimated tokens: 477, Rate usage: 843
Size: 800x800, Images: 1, Prompt usage: 817, Estimated tokens: 817, Rate usage: 843
Size: 1200x400, Images: 1, Prompt usage: 647, Estimated tokens: 647, Rate usage: 843
Size: 1200x800, Images: 1, Prompt usage: 1157, Estimated tokens: 1157, Rate usage: 843
Size: 1600x400, Images: 1, Prompt usage: 817, Estimated tokens: 817, Rate usage: 843
Size: 1600x800, Images: 1, Prompt usage: 1157, Estimated tokens: 1157, Rate usage: 843
Size: 400x400, Images: 10, Prompt usage: 2674, Estimated tokens: 2674, Rate usage: 7788
Size: 400x800, Images: 10, Prompt usage: 4374, Estimated tokens: 4374, Rate usage: 7788
Size: 800x400, Images: 10, Prompt usage: 4374, Estimated tokens: 4374, Rate usage: 7788
Size: 800x800, Images: 10, Prompt usage: 7774, Estimated tokens: 7774, Rate usage: 7788
Size: 1200x400, Images: 10, Prompt usage: 6074, Estimated tokens: 6074, Rate usage: 7788
Size: 1200x800, Images: 10, Prompt usage: 11174, Estimated tokens: 11174, Rate usage: 7788
Size: 1600x400, Images: 10, Prompt usage: 7774, Estimated tokens: 7774, Rate usage: 7788
Size: 1600x800, Images: 10, Prompt usage: 11174, Estimated tokens: 11174, Rate usage: 7788
gpt-4o-2024-08-06; detail:low
Size: 400x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 400x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 800x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 800x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 1200x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 1200x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 1600x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 1600x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 70
Size: 400x400, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 400x800, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 800x400, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 800x800, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 1200x400, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 1200x800, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 1600x400, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 1600x800, Images: 1, Prompt usage: 130, Estimated tokens: 130, Rate usage: 835
Size: 400x400, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720
Size: 400x800, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720
Size: 800x400, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720
Size: 800x800, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720
Size: 1200x400, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720
Size: 1200x800, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720
Size: 1600x400, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720
Size: 1600x800, Images: 10, Prompt usage: 895, Estimated tokens: 895, Rate usage: 7720
gpt-4o-mini-2024-07-18; detail: high
Size: 400x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 400x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 800x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 800x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 1200x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 1200x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 1600x400, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 1600x800, Images: 0, Prompt usage: 45, Estimated tokens: 45, Rate usage: 71
Size: 400x400, Images: 1, Prompt usage: 8545, Estimated tokens: 300, Rate usage: 835
Size: 400x800, Images: 1, Prompt usage: 14212, Estimated tokens: 470, Rate usage: 835
Size: 800x400, Images: 1, Prompt usage: 14212, Estimated tokens: 470, Rate usage: 835
Size: 800x800, Images: 1, Prompt usage: 25546, Estimated tokens: 810, Rate usage: 835
Size: 1200x400, Images: 1, Prompt usage: 19879, Estimated tokens: 640, Rate usage: 835
Size: 1200x800, Images: 1, Prompt usage: 36880, Estimated tokens: 1150, Rate usage: 835
Size: 1600x400, Images: 1, Prompt usage: 25546, Estimated tokens: 810, Rate usage: 835
Size: 1600x800, Images: 1, Prompt usage: 36880, Estimated tokens: 1150, Rate usage: 835
Size: 400x400, Images: 10, Prompt usage: 85045, Estimated tokens: 2595, Rate usage: 7721
Size: 400x800, Images: 10, Prompt usage: 141715, Estimated tokens: 4295, Rate usage: 7720
Size: 800x400, Images: 10, Prompt usage: 141715, Estimated tokens: 4295, Rate usage: 7721
Size: 800x800, Images: 10, Prompt usage: 255055, Estimated tokens: 7695, Rate usage: 7720
Size: 1200x400, Images: 10, Prompt usage: 198385, Estimated tokens: 5995, Rate usage: 7721
Size: 1200x800, Images: 10, Prompt usage: 368395, Estimated tokens: 11095, Rate usage: 7720
Detail:low used to give a unique lower overestimate. This would be pretty obvious to set at 85 tokens in the rate limiter, so that images such as a video stream aren’t blocked at 1/10th the rate.