Are the vision tokens added to the tokens per request limit?

darkov99999 · September 15, 2024, 10:48pm

Basically what the title says. I can’t figure it out and testing it is difficult.
I know they are added to the tokens per minute, but do they take away from the models tokens per request?

darkov99999 · September 15, 2024, 11:54pm

Well I managed to test it. I believe they are added to the per request limit. But I would prefer an official confirmation.

darkov99999 · September 16, 2024, 10:06am

This does make using the gpt-4o-mini model really bad with vision for images, considering their base token size of 2833.

_j · September 16, 2024, 12:14pm

Yes, you can run it up over the input context limit on image tokens

For example that can be demonstrated more quickly with a tier-1 account, you’ll get an API error if sending 30000 tokens that are considered towards the rate limit.

However, the rate consumption hit is NOT the tokens of input prompt. It seems the API just guesses at or provides an arbitrary rate for images.

To make a full demonstration, I created code with an image generator to make an arbitrary number of images at specified size, to be attached to user messages to a specified model. When sent as parts of one user message, each image block within the total message also has a text message block before it with the file name.

Then I just have to employ it over several minutes, on an API account with rate not impacted by these calls with a pause between.

if __name__ == "__main__":
    for mod in ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo"]:
        for qual in ["low", "high"]:
            for size in [128, 512, 1024]:
                for image_num in [1, 10]:
                    pr, r = do_request(img_quality=qual, img_size=size, img_count=image_num, model=mod)
                    print(f"{mod}({image_num:02d} @{size},{qual}): prompt usage: {pr}, rate usage: {r}")

gpt-4o

Image Count	Size	Quality	Prompt Usage	Rate Usage
01	128	low	139	828
10	128	low	976	7773
01	512	low	139	828
10	512	low	976	7773
01	1024	low	139	828
10	1024	low	976	7773
-------------	------	---------	--------------	------------
01	128	high	309	828
10	128	high	2676	7773
01	512	high	309	828
10	512	high	2676	7773
01	1024	high	819	828
10	1024	high	7776	7773

gpt-4o-mini

Image Count	Size	Quality	Prompt Usage	Rate Usage
01	128	low	2887	828
10	128	low	28456	7773
01	512	low	2887	828
10	512	low	28456	7773
01	1024	low	2887	828
10	1024	low	28456	7773
-------------	------	---------	--------------	------------
01	128	high	8554	828
10	128	high	85126	7773
01	512	high	8554	828
10	512	high	85126	7773
01	1024	high	25555	828
10	1024	high	255136	7773

gpt-4-turbo

Image Count	Size	Quality	Prompt Usage	Rate Usage
01	128	low	140	828
10	128	low	977	7773
01	512	low	140	828
10	512	low	977	7773
01	1024	low	140	828
10	1024	low	977	7773
-------------	------	---------	--------------	------------
01	128	high	310	828
10	128	high	2677	7773
01	512	high	310	828
10	512	high	2677	7773
01	1024	high	820	828
10	1024	high	7777	7773

You can see that across all models, the same number of images has the same impact on rate, regardless of size or quality setting.

Your cost however is dependent on the image inputs and how they are tokenized and tiled.

gpt-4o-mini costs TWICE as much as gpt-4o for images after the token price calculations are done, because the cost per token is multipled. No cheap image analysis for you!

AI Analysis: Anomalous Prompt and Rate Consumption

Prompt Usage Anomalies

gpt-4o-mini:
- Across all sizes and image counts, this model shows a significantly higher prompt usage compared to the other models. For instance, at 01 @128,low, it consumes 2887 prompts compared to gpt-4o and gpt-4-turbo, which consume only 139 and 140, respectively. This discrepancy remains consistent even for higher image counts and sizes.
- High-quality images exacerbate the prompt usage in gpt-4o-mini. At 10 @1024,high, it consumes 255136 prompts, which is an order of magnitude higher than gpt-4o (only 7776) and gpt-4-turbo (only 7777).
gpt-4-turbo and gpt-4o:
- Both models behave similarly in terms of prompt usage, with minimal differences. However, gpt-4-turbo shows slightly higher consumption at high quality, but not at an anomalous level.

Rate Usage Anomalies

Rate usage across all models remains relatively consistent, with 828 for 1 image and 7773 for 10 images. There are no notable anomalies in rate consumption across different image sizes, counts, or quality levels for any of the models.

Conclusions:

gpt-4o-mini shows clear anomalies in prompt consumption across all image sizes and counts. This suggests that either the model has a higher intrinsic token usage or an inefficiency in prompt compression when processing images, especially at higher quality levels.
gpt-4o and gpt-4-turbo are more stable and efficient in their prompt usage, with no significant anomalies in prompt or rate consumption.

What the AI was looking at but never had a chance to respond to:

darkov99999 · September 16, 2024, 4:52pm

Thanks for the analysis. I had to send 128k tokens for my tests.

Topic		Replies	Views
Token Usage for Images Remains Constant Regardless of Size - Is This a Bug? API	6	3908	September 23, 2024
Responses API Image Generation Token Usage API api-usage , gpt-image-1 , responses-api	1	115	June 4, 2025
Vision token counts does not correspond to the documentation Bugs token , api-vision	3	200	December 30, 2024
Unexpectedly High Token Count When Using Image Inputs with gpt-4o-mini API	3	272	April 1, 2025
GPT-4.1 vision price calculations -- incorrect billing on full model Bugs bug , gpt-4-vision , gpt-41	7	398	April 24, 2025