Basically what the title says. I can’t figure it out and testing it is difficult.
I know they are added to the tokens per minute, but do they take away from the models tokens per request?
Well I managed to test it. I believe they are added to the per request limit. But I would prefer an official confirmation.
This does make using the gpt-4o-mini model really bad with vision for images, considering their base token size of 2833.
Yes, you can run it up over the input context limit on image tokens
For example that can be demonstrated more quickly with a tier-1 account, you’ll get an API error if sending 30000 tokens that are considered towards the rate limit.
However, the rate consumption hit is NOT the tokens of input prompt. It seems the API just guesses at or provides an arbitrary rate for images.
To make a full demonstration, I created code with an image generator to make an arbitrary number of images at specified size, to be attached to user messages to a specified model. When sent as parts of one user message, each image block within the total message also has a text message block before it with the file name.
Then I just have to employ it over several minutes, on an API account with rate not impacted by these calls with a pause between.
if __name__ == "__main__":
for mod in ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo"]:
for qual in ["low", "high"]:
for size in [128, 512, 1024]:
for image_num in [1, 10]:
pr, r = do_request(img_quality=qual, img_size=size, img_count=image_num, model=mod)
print(f"{mod}({image_num:02d} @{size},{qual}): prompt usage: {pr}, rate usage: {r}")
gpt-4o
Image Count | Size | Quality | Prompt Usage | Rate Usage |
---|---|---|---|---|
01 | 128 | low | 139 | 828 |
10 | 128 | low | 976 | 7773 |
01 | 512 | low | 139 | 828 |
10 | 512 | low | 976 | 7773 |
01 | 1024 | low | 139 | 828 |
10 | 1024 | low | 976 | 7773 |
------------- | ------ | --------- | -------------- | ------------ |
01 | 128 | high | 309 | 828 |
10 | 128 | high | 2676 | 7773 |
01 | 512 | high | 309 | 828 |
10 | 512 | high | 2676 | 7773 |
01 | 1024 | high | 819 | 828 |
10 | 1024 | high | 7776 | 7773 |
gpt-4o-mini
Image Count | Size | Quality | Prompt Usage | Rate Usage |
---|---|---|---|---|
01 | 128 | low | 2887 | 828 |
10 | 128 | low | 28456 | 7773 |
01 | 512 | low | 2887 | 828 |
10 | 512 | low | 28456 | 7773 |
01 | 1024 | low | 2887 | 828 |
10 | 1024 | low | 28456 | 7773 |
------------- | ------ | --------- | -------------- | ------------ |
01 | 128 | high | 8554 | 828 |
10 | 128 | high | 85126 | 7773 |
01 | 512 | high | 8554 | 828 |
10 | 512 | high | 85126 | 7773 |
01 | 1024 | high | 25555 | 828 |
10 | 1024 | high | 255136 | 7773 |
gpt-4-turbo
Image Count | Size | Quality | Prompt Usage | Rate Usage |
---|---|---|---|---|
01 | 128 | low | 140 | 828 |
10 | 128 | low | 977 | 7773 |
01 | 512 | low | 140 | 828 |
10 | 512 | low | 977 | 7773 |
01 | 1024 | low | 140 | 828 |
10 | 1024 | low | 977 | 7773 |
------------- | ------ | --------- | -------------- | ------------ |
01 | 128 | high | 310 | 828 |
10 | 128 | high | 2677 | 7773 |
01 | 512 | high | 310 | 828 |
10 | 512 | high | 2677 | 7773 |
01 | 1024 | high | 820 | 828 |
10 | 1024 | high | 7777 | 7773 |
You can see that across all models, the same number of images has the same impact on rate, regardless of size or quality setting.
Your cost however is dependent on the image inputs and how they are tokenized and tiled.
gpt-4o-mini
costs TWICE as much as gpt-4o
for images after the token price calculations are done, because the cost per token is multipled. No cheap image analysis for you!
AI Analysis: Anomalous Prompt and Rate Consumption
Prompt Usage Anomalies
-
gpt-4o-mini:
- Across all sizes and image counts, this model shows a significantly higher prompt usage compared to the other models. For instance, at
01 @128,low
, it consumes 2887 prompts compared togpt-4o
andgpt-4-turbo
, which consume only 139 and 140, respectively. This discrepancy remains consistent even for higher image counts and sizes. - High-quality images exacerbate the prompt usage in
gpt-4o-mini
. At10 @1024,high
, it consumes 255136 prompts, which is an order of magnitude higher thangpt-4o
(only 7776) andgpt-4-turbo
(only 7777).
- Across all sizes and image counts, this model shows a significantly higher prompt usage compared to the other models. For instance, at
-
gpt-4-turbo and gpt-4o:
- Both models behave similarly in terms of prompt usage, with minimal differences. However,
gpt-4-turbo
shows slightly higher consumption athigh
quality, but not at an anomalous level.
- Both models behave similarly in terms of prompt usage, with minimal differences. However,
Rate Usage Anomalies
- Rate usage across all models remains relatively consistent, with 828 for 1 image and 7773 for 10 images. There are no notable anomalies in rate consumption across different image sizes, counts, or quality levels for any of the models.
Conclusions:
- gpt-4o-mini shows clear anomalies in prompt consumption across all image sizes and counts. This suggests that either the model has a higher intrinsic token usage or an inefficiency in prompt compression when processing images, especially at higher quality levels.
- gpt-4o and gpt-4-turbo are more stable and efficient in their prompt usage, with no significant anomalies in prompt or rate consumption.
What the AI was looking at but never had a chance to respond to:
Thanks for the analysis. I had to send 128k tokens for my tests.