Image based inference to gpt4o appears to be severely impaired due to excessively naive token counting

The quality is not impaired by the rate limit mechanism.

The rate limiter is a simple estimate. The rate limiter is like a firewall. The only function is to block API requests from reaching AI models if the limit set or the limits of an organization are exceeded.

The language tokens are an estimate that are close but not the actual amount.
The images have a fixed rate consumption regardless of any settings: 771 tokens per image

Because it must block excessive requests, neither deep inspection nor advanced computation is used on the API request. That means that yes, some requests impact accumulated rate more than their AI consumption, while others have a lesser impact than true usage.

Despite all your posted evidence for some unknown theory, the rate limiter and you being rejected for exceeding it has nothing to do with the quality of vision or output.

When you are making requests correctly, in many cases, you can send multiple images in a single user message to obtain unique classifications or descriptions. The quality degrades after five or ten.

1 Like