TPM Limit Exceeded for my first Vision API request

Hey all, I’m a bit lost, I’ve read through the Rate Limits and Calculating Cost sections and can’t figure out what is going on. I’m getting the following error on my first query.

Vision API Response: {'error': {'message': 'Request too large for gpt-4-vision-preview in organization org-{redacted} on tokens per min (TPM): Limit 40000, Requested 779878. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

What I’m confused about is why is it requesting so many tokens?? In the Calculating Costs section on the Vision guide, they say:

images are first scaled to fit within a 2048 x 2048 square, maintaining their aspect ratio. Then, they are scaled such that the shortest side of the image is 768px long. Finally, we count how many 512px squares the image consists of. Each of those squares costs 170 tokens. Another 85 tokens are always added to the final total.

My test image is 1536x2048, so it would get resized to 768x1024, which should be 4 tiles of 512… So,

170 * 4 + 85 = 765

My request should be 765 tokens… Why is it saying I’ve requested 779878?? Is it because I’m using base64 encoding? I’m trying to test my tool as a client-side tool for images on their desktop… I don’t have URLs, base64 is really my only option… Is there not a better model for pricing/rate limiting base64?

I just tested sending an image of 1536x2048 and I got the following results:

User prompt:

please describe the image included and write a haiku about it.

Output:

{
  id: 'chatcmpl-9BW1gTgyzfQJya8CNHR6gYU84CrXX',
  object: 'chat.completion',
  created: 1712531744,
  model: 'gpt-4-1106-vision-preview',
  choices: [
    {
      index: 0,
      message: [Object],
      logprobs: null,
      finish_reason: 'stop'
    }
  ],
  usage: { prompt_tokens: 842, completion_tokens: 102, total_tokens: 944 },
  system_fingerprint: null
}

Message output:

{
  index: 0,
  message: {
    role: 'assistant',
    content: 'The image shows a close-up of cherry blossoms, known as sakura in Japanese, with a focus on the delicate pink flowers and reddish leaves. The blossoms are attached to a dark, rugged tree branch that contrasts with the softness of the petals. The background is a pale, overcast sky, which allows the colors of the flowers to stand out.\n' +
      '\n' +
      'Here is a haiku inspired by the image:\n' +
      '\n' +
      'Cherry blossoms bloom,\n' +
      'Soft whispers of pink and red,\n' +
      "Spring's gentle embrace."
  },
  logprobs: null,
  finish_reason: 'stop'
}

Image used for testing:

Without seeing your code I cannot say where could be the problem.

I actually pre-resize the images myself to try and minimize the base64 encoding size… I don’t know if it helps much, but hey, I tried.

def encode_and_resize_image_base64(image_path, output_size=(768, 768)):
    with Image.open(image_path) as img:
        # Maintain aspect ratio
        img.thumbnail(output_size, Image.Resampling.LANCZOS)
        
        # Save the resized image to a bytes buffer
        buffer = io.BytesIO()
        img.save(buffer, format="JPEG")  # Adjust format as needed
        buffer.seek(0)
        
        # Encode the resized image
        image_resized_and_encoded = base64.b64encode(buffer.read()).decode('utf-8')
        
        return image_resized_and_encoded

Then when I pass it, it’s taken almost directly from their guide. I first have a function that sets the payload and captures the response from the completion.

def analyze_image_with_vision_api(image_base64):
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }

    payload = {
        "model": "gpt-4-vision-preview",
        "messages": [
            {
                "role": "user",
                "content": "What’s in this image?"
            },
            {
                "role": "system",
                "content": f"data:image/jpeg;base64,{image_base64}"
            }
        ],
    }

    try:
        response = requests.post("https://api.openai.com/v1/chat/completions",
                                 headers=headers, json=payload)
        return response.json()
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

and then I call that function and handle errors.

if needs_resize:
    image_to_process = encode_and_resize_image_base64(image_path)
else:
    image_to_process = encode_image_base64(image_path)

vision_response = analyze_image_with_vision_api(image_to_process)
if vision_response:
    print("Vision API Response:", vision_response)
else:
    print("Failed to analyze image.")

As an update, I was able to get it to start processing images. I am not sure if resizing it in advance was the solution, it kind of… just started working. The only major thing I changed was I migrated from trying to query using the post method and transitioned to the SDK OpenAI method using client.chat.completions. Somehow that combined with pre-sizing the photo reduced the token count significantly. I think it was the post-method eating up my tokens, not sure.

That said, I still feel like I’m eating a ton of tokens using base64… I’ve gotten it down from 60k-70k tokens per image to ~2k tokens per image (context tokens).

Progress!

Images don’t use tokens by the size of the file that you send. They use tokens by the size of the image.

512px x 512px or smaller, from using or in conjunction with using detail: low. = 85 tokens

Then, if you use detail: high, you get billed for another round of tokens, at 170 tokens per “tile”.

Where tile is a 512px square.

So a “high” image at 512 x 512 = 85 + 170
A “high” image at 512 x 513 = 85 + 340
A "high image at 513x513 = 85 + 680
513 x 1025 = 1105 tokens total…

The initial request of this topic that had 700000 tokens was most certainly sending the message incorrectly, so that the image data was processed as language instead of going to vision for encoding for image recognition. (I’ve done that, a $1 mistake when it goes through…)

I’m facing the same exact issue (but when I send a list of base64 images) Did you solve this @tcallen247

Do you get an error about rate limit, or that the request was too large for the model?

Do you have an AI that can see just a few images, but more images fails?

The amount of rate limit that is consumed by images can be in excess of the billed tokens. Any image request seems to count against the usage by 764 tokens per image, despite that the image cost can be between 85 tokens for detail:low, and over 1400 for detail:high with a large wide image that uses the most tiles.

That is compounded with the tier-1 rate being limited to 30000 tokens per minute, smaller than gpt-4o can accept. You can exceed that and get denied if you try to send a large number of images (frankly, beyond the AI’s capacity for understanding if over 20) along with lots of system instruction, prompt, or chat in one API call.

You can capture the headers of your request (by using with_raw_response method with the python library, or just response.headers with a requests library call), along with the exact error code and message. Then see if indeed the rate limit is is being reached, and were it a job that must be completed, evaluate whether you should increase the total payments you’ve made to OpenAI above $50 to increase your tier.

Hey @_j thanks a lot for the reply. I have put my code and the error over at Tried everything with RateLimitError: Error code: 429 with gpt4-o. Could you have a look? Thanks!

You do not have “the exact same issue then”.

The message would continue with “check your plan and billing details”. That you don’t have funds to pay for your request.

Credit balance is shown here: https://platform.openai.com/settings/organization/billing/overview

If your account is $0.00 (or below) waiting a bit after funding your account from a zero balance for reactivation may be required.

I’m sorry I don’t understand what you said. My error is very similar to what OP have posted.

Hi @ruki

Yours is a bit different issue company to what OP @tcallen247 was facing. They were directly passing the base64 encoded image without using the image_url block, which means that the model took all that huge image data and treated it as text.

OP needs to use an image_url block in the content list of user message to send the base64 encoded image.

Hey @sps thanks a lot for the clarification, that makes sense! So for my issue the only option is to increase the tier (to tier 2)? Is there anything else to be done from my side like batching the requests etc.?

Kindly refer to Tried everything with RateLimitError: Error code: 429 with gpt4-o to get the whole idea of my situation.

Also, is there a way that I can count roughly how much tokens will be used for processing let’s say 120 images of 512x512 size (detail:low)? The amount I get in the calculator and the result.usage is vastly different from what was in the error message.

Thanks again for your time.

The rate limiter considers any and all images to be an increase of 764 tokens per message attachment. 120 of them = 91680 tokens, more than tier-1’s 30000 per minute, blocking a single request.

oh yeah that count is more like what I’ve been seeing. But I couldn’t find this 764 documented anywhere that I checked (maybe I have to check more).

But confused as to what they meant All images with detail: low cost 85 tokens each I feel really dumb not understanding the token calculations.

https://platform.openai.com/docs/guides/vision/low-or-high-fidelity-image-understanding

My use case is, I need to send a large number of frames (30k) for some analysis and get the data based on that. Seems it’s not possible with this. Any alternative or suggestion? Basically I need to get the timestamps to clip the video based on some criteria (say give me start and end timestamps to all the clips where a woman is dancing)

The impact of images on rate limit can only be seen by obtaining and experimenting with the headers values, which I’ve done.

That is separate from the amount you actually pay for the images. Probably a kludge because the rate limit token encoder of the API front-end can’t encode images the way the API model does.

Based on the docs, at detail: "low" every image regardless of the size should count 85 tokens each towards your quota.

It’s the abrupt increase of token count when using more than 39 images that I’m intrigued by.

1 Like

Exactly. I mean if they say 85 (and no where else anything else), it SHOULD count that much (or at least in that range), not nearly 10X that amount, right? Because I feel like I’ve been misled.

If they say

All images with detail: low cost 85 tokens each.

Then

All images with detail: low MUST cost 85 tokens each (or near that). So for my case, I expect a token usage of 85*120 = 10,200 tokens, which is well under the 30k TPM. This is my expectation.

Instead they saying it’s over 30k (without any explanation or proper customer support channel), it’s frustrating because I’m paying my dollars either way

P.S. - I really don’t know where that 764 number came from like @_j mentioned

One image:

‘x-ratelimit-limit-tokens’: 10000000 ‘x-ratelimit-remaining-tokens’: 9999215

Two:

‘x-ratelimit-limit-tokens’: 10000000 ‘x-ratelimit-remaining-tokens’: 9998451

Three

‘x-ratelimit-limit-tokens’: 10000000 ‘x-ratelimit-remaining-tokens’: 9997685

Ten:

‘x-ratelimit-limit-tokens’: 10000000 ‘x-ratelimit-remaining-tokens’: 9992331

1 Like

Thanks a lot! this helps understand it (although I don’t understand why this information is not disclosed anywhere…) Seems the 85 token per low image is really misleading on their docs.

A simple section like below

For a single image detail:low request,

would be more than enough of a clarification. This really should be in the docs I think.

On that note, for my use-case where i need to send around 30k frames, well I can’t… Even with the highest tier because of how the tokens are calculated for rate limit. I have to think of an alternative way.

The maximum detail:low images that could be sent (with the 10M TPM) (with even Tier 5) is,

10,000,000/764 = ~13k images

Am I right about the above?

1 Like

I agree that this need to be added to docs, since the current official explanation is misleading.