TPM Limit Exceeded for my first Vision API request

Hey all, I’m a bit lost, I’ve read through the Rate Limits and Calculating Cost sections and can’t figure out what is going on. I’m getting the following error on my first query.

Vision API Response: {'error': {'message': 'Request too large for gpt-4-vision-preview in organization org-{redacted} on tokens per min (TPM): Limit 40000, Requested 779878. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

What I’m confused about is why is it requesting so many tokens?? In the Calculating Costs section on the Vision guide, they say:

images are first scaled to fit within a 2048 x 2048 square, maintaining their aspect ratio. Then, they are scaled such that the shortest side of the image is 768px long. Finally, we count how many 512px squares the image consists of. Each of those squares costs 170 tokens. Another 85 tokens are always added to the final total.

My test image is 1536x2048, so it would get resized to 768x1024, which should be 4 tiles of 512… So,

170 * 4 + 85 = 765

My request should be 765 tokens… Why is it saying I’ve requested 779878?? Is it because I’m using base64 encoding? I’m trying to test my tool as a client-side tool for images on their desktop… I don’t have URLs, base64 is really my only option… Is there not a better model for pricing/rate limiting base64?

I just tested sending an image of 1536x2048 and I got the following results:

User prompt:

please describe the image included and write a haiku about it.

Output:

{
  id: 'chatcmpl-9BW1gTgyzfQJya8CNHR6gYU84CrXX',
  object: 'chat.completion',
  created: 1712531744,
  model: 'gpt-4-1106-vision-preview',
  choices: [
    {
      index: 0,
      message: [Object],
      logprobs: null,
      finish_reason: 'stop'
    }
  ],
  usage: { prompt_tokens: 842, completion_tokens: 102, total_tokens: 944 },
  system_fingerprint: null
}

Message output:

{
  index: 0,
  message: {
    role: 'assistant',
    content: 'The image shows a close-up of cherry blossoms, known as sakura in Japanese, with a focus on the delicate pink flowers and reddish leaves. The blossoms are attached to a dark, rugged tree branch that contrasts with the softness of the petals. The background is a pale, overcast sky, which allows the colors of the flowers to stand out.\n' +
      '\n' +
      'Here is a haiku inspired by the image:\n' +
      '\n' +
      'Cherry blossoms bloom,\n' +
      'Soft whispers of pink and red,\n' +
      "Spring's gentle embrace."
  },
  logprobs: null,
  finish_reason: 'stop'
}

Image used for testing:

Without seeing your code I cannot say where could be the problem.

I actually pre-resize the images myself to try and minimize the base64 encoding size… I don’t know if it helps much, but hey, I tried.

def encode_and_resize_image_base64(image_path, output_size=(768, 768)):
    with Image.open(image_path) as img:
        # Maintain aspect ratio
        img.thumbnail(output_size, Image.Resampling.LANCZOS)
        
        # Save the resized image to a bytes buffer
        buffer = io.BytesIO()
        img.save(buffer, format="JPEG")  # Adjust format as needed
        buffer.seek(0)
        
        # Encode the resized image
        image_resized_and_encoded = base64.b64encode(buffer.read()).decode('utf-8')
        
        return image_resized_and_encoded

Then when I pass it, it’s taken almost directly from their guide. I first have a function that sets the payload and captures the response from the completion.

def analyze_image_with_vision_api(image_base64):
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }

    payload = {
        "model": "gpt-4-vision-preview",
        "messages": [
            {
                "role": "user",
                "content": "What’s in this image?"
            },
            {
                "role": "system",
                "content": f"data:image/jpeg;base64,{image_base64}"
            }
        ],
    }

    try:
        response = requests.post("https://api.openai.com/v1/chat/completions",
                                 headers=headers, json=payload)
        return response.json()
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

and then I call that function and handle errors.

if needs_resize:
    image_to_process = encode_and_resize_image_base64(image_path)
else:
    image_to_process = encode_image_base64(image_path)

vision_response = analyze_image_with_vision_api(image_to_process)
if vision_response:
    print("Vision API Response:", vision_response)
else:
    print("Failed to analyze image.")

As an update, I was able to get it to start processing images. I am not sure if resizing it in advance was the solution, it kind of… just started working. The only major thing I changed was I migrated from trying to query using the post method and transitioned to the SDK OpenAI method using client.chat.completions. Somehow that combined with pre-sizing the photo reduced the token count significantly. I think it was the post-method eating up my tokens, not sure.

That said, I still feel like I’m eating a ton of tokens using base64… I’ve gotten it down from 60k-70k tokens per image to ~2k tokens per image (context tokens).

Progress!

Images don’t use tokens by the size of the file that you send. They use tokens by the size of the image.

512px x 512px or smaller, from using or in conjunction with using detail: low. = 85 tokens

Then, if you use detail: high, you get billed for another round of tokens, at 170 tokens per “tile”.

Where tile is a 512px square.

So a “high” image at 512 x 512 = 85 + 170
A “high” image at 512 x 513 = 85 + 340
A "high image at 513x513 = 85 + 680
513 x 1025 = 1105 tokens total…

The initial request of this topic that had 700000 tokens was most certainly sending the message incorrectly, so that the image data was processed as language instead of going to vision for encoding for image recognition. (I’ve done that, a $1 mistake when it goes through…)