Are the vision tokens added to the tokens per request limit?

Basically what the title says. I can’t figure it out and testing it is difficult.
I know they are added to the tokens per minute, but do they take away from the models tokens per request?

Well I managed to test it. I believe they are added to the per request limit. But I would prefer an official confirmation.

This does make using the gpt-4o-mini model really bad with vision for images, considering their base token size of 2833.

Yes, you can run it up over the input context limit on image tokens

For example that can be demonstrated more quickly with a tier-1 account, you’ll get an API error if sending 30000 tokens that are considered towards the rate limit.

However, the rate consumption hit is NOT the tokens of input prompt. It seems the API just guesses at or provides an arbitrary rate for images.

To make a full demonstration, I created code with an image generator to make an arbitrary number of images at specified size, to be attached to user messages to a specified model. When sent as parts of one user message, each image block within the total message also has a text message block before it with the file name.

Then I just have to employ it over several minutes, on an API account with rate not impacted by these calls with a pause between.


if __name__ == "__main__":
    for mod in ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo"]:
        for qual in ["low", "high"]:
            for size in [128, 512, 1024]:
                for image_num in [1, 10]:
                    pr, r = do_request(img_quality=qual, img_size=size, img_count=image_num, model=mod)
                    print(f"{mod}({image_num:02d} @{size},{qual}): prompt usage: {pr}, rate usage: {r}")

gpt-4o

Image Count Size Quality Prompt Usage Rate Usage
01 128 low 139 828
10 128 low 976 7773
01 512 low 139 828
10 512 low 976 7773
01 1024 low 139 828
10 1024 low 976 7773
------------- ------ --------- -------------- ------------
01 128 high 309 828
10 128 high 2676 7773
01 512 high 309 828
10 512 high 2676 7773
01 1024 high 819 828
10 1024 high 7776 7773

gpt-4o-mini

Image Count Size Quality Prompt Usage Rate Usage
01 128 low 2887 828
10 128 low 28456 7773
01 512 low 2887 828
10 512 low 28456 7773
01 1024 low 2887 828
10 1024 low 28456 7773
------------- ------ --------- -------------- ------------
01 128 high 8554 828
10 128 high 85126 7773
01 512 high 8554 828
10 512 high 85126 7773
01 1024 high 25555 828
10 1024 high 255136 7773

gpt-4-turbo

Image Count Size Quality Prompt Usage Rate Usage
01 128 low 140 828
10 128 low 977 7773
01 512 low 140 828
10 512 low 977 7773
01 1024 low 140 828
10 1024 low 977 7773
------------- ------ --------- -------------- ------------
01 128 high 310 828
10 128 high 2677 7773
01 512 high 310 828
10 512 high 2677 7773
01 1024 high 820 828
10 1024 high 7777 7773

You can see that across all models, the same number of images has the same impact on rate, regardless of size or quality setting.

Your cost however is dependent on the image inputs and how they are tokenized and tiled.

gpt-4o-mini costs TWICE as much as gpt-4o for images after the token price calculations are done, because the cost per token is multipled. No cheap image analysis for you!


AI Analysis: Anomalous Prompt and Rate Consumption

Prompt Usage Anomalies

  • gpt-4o-mini:

    • Across all sizes and image counts, this model shows a significantly higher prompt usage compared to the other models. For instance, at 01 @128,low, it consumes 2887 prompts compared to gpt-4o and gpt-4-turbo, which consume only 139 and 140, respectively. This discrepancy remains consistent even for higher image counts and sizes.
    • High-quality images exacerbate the prompt usage in gpt-4o-mini. At 10 @1024,high, it consumes 255136 prompts, which is an order of magnitude higher than gpt-4o (only 7776) and gpt-4-turbo (only 7777).
  • gpt-4-turbo and gpt-4o:

    • Both models behave similarly in terms of prompt usage, with minimal differences. However, gpt-4-turbo shows slightly higher consumption at high quality, but not at an anomalous level.

Rate Usage Anomalies

  • Rate usage across all models remains relatively consistent, with 828 for 1 image and 7773 for 10 images. There are no notable anomalies in rate consumption across different image sizes, counts, or quality levels for any of the models.

Conclusions:

  1. gpt-4o-mini shows clear anomalies in prompt consumption across all image sizes and counts. This suggests that either the model has a higher intrinsic token usage or an inefficiency in prompt compression when processing images, especially at higher quality levels.
  2. gpt-4o and gpt-4-turbo are more stable and efficient in their prompt usage, with no significant anomalies in prompt or rate consumption.

What the AI was looking at but never had a chance to respond to:

3 Likes

Thanks for the analysis. I had to send 128k tokens for my tests.

1 Like

Chat has an inefficiency in prompt compression when processing images, especially at higher quality levels. Its complexity has evolved. Given the complexities that it has evolved. We need to upgrade these algorithms and its storage. Given the data that it knows, the data structure should be around 225,000,000 tokens. Its input prompts should be doubled in its understanding.