Token Usage for Images Remains Constant Regardless of Size - Is This a Bug?

Hi everyone,

I’ve been using the OpenAI API to process invoice summaries with images. I have a function that takes images and a prompt to generate a summary. Here’s the function I’m using:

async function processInvoiceSummary(images: string[], prompt: string) {
    const messages: any[] = [
        {
            role: 'user',
            content: [
                {
                    type: 'text',
                    text: prompt
                }
            ]
        }
    ];

    images.forEach((image) => {
        messages[0].content.push({
            type: 'image_url',
            image_url: {
                url: image,
                detail: 'high'
            }
        });
    });

    const completion = await openai.chat.completions.create({
        model: gptModel,
        messages: messages
    });

    const summary = completion.choices[0].message.content;
    return { summary, usage: completion.usage };
}

I am testing it with 4 images, and the result for token usage is always the same, like this:

{
  "prompt_tokens": 103310,
  "completion_tokens": 123,
  "total_tokens": 103433,
  "completion_tokens_details": { "reasoning_tokens": 0 }
}

tried changing the image sizes from the original 1153x1536 to 3756x5000, but the token usage still remains the same.

This doesn’t seem right, based on the documentation which mentions:

  • high will enable “high res” mode, which first allows the model to first see the low res image (using 85 tokens) and then creates detailed crops using 170 tokens for each 512px x 512px tile.

Am I doing something wrong here, or is this a known issue/bug? Any insights or advice would be greatly appreciated!

I just did a deep dive into what you can expect for token usage (and rate usage) for a variety of resolutions, detail settings, and models.

If the image resolution at detail:high takes the same number of tiles, the cost will be the same. This means anything from 513x513 to 1024x1024, or anything in between, results in 4 overlay tiles (on top of a base “low” image.)

There are also peculiarities in the internal downsizing even on detail:high. Your image will be downsized so the shortest dimension is at most 768 pixels. Send 3000x3000, the model sees 768x768 - 4 tiles of 512x512. Send 2000x500, the model sees 2000x500, also 4 tiles of 512x512.

Welcome @joswin86

It’s not a bug. This cost calculation is by design.

You can use the image cost calculators for your model of choice on the pricing page to calculate and get a breakdown of how the costs are calculated for an image size.

Here's the cost breakdown for a 1536x1536 image with gpt-4o-mini:
Price per 1M tokens (fixed) $0.15
Resized width 768
Resized height 768
512 × 512 tiles 2 × 2
Total tiles 4
Base tokens 2833
Tile tokens 5667 × 4 = 22668
Total tokens 25501
Total price $0.003825
Here's the cost calculation for the same model for an image of 3756x5000
Price per 1M tokens (fixed) $0.15
Resized width 768
Resized height 1023
512 × 512 tiles 2 × 2
Total tiles 4
Base tokens 2833
Tile tokens 5667 × 4 = 22668
Total tokens 25501
Total price $0.003825
1 Like

I’m trying to get o1-preview to make an app, show what it can do. I provided all the rules and specifications.

The playground is designed to time out and take your money apparently. Over to my chatbot.

The first go was terrible: 100x500 = 16 high detail tiles? It couldn’t be improved.

Abandon. Second new extensive prompt covering every area of failure, even explaining the origin of tiles would start at corners, close but no cookie.

You get the idea though. How OpenAI could present this on the web.

Thanks for the insights!

Just to clarify, since my images always have the same ratio (A4 format), does that mean it doesn’t make any difference if I increase the resolution beyond 768px on the smaller side? From what I understand, as long as the smaller side exceeds 768px, the model will downsize it to 768px, and the token usage will remain the same regardless of any further increase in resolution, correct?

Appreciate the help!

You have reached a correct conclusion. As:

A large A4 paper image (actually any A paper size in tall aspect ratio) would always resize to 1087x768. That then would consume six token tiles of high detail, as the longest dimension exceeds two tiles.

You can consider then the economy of sending 1024x725 as the image, where the expense would drop to four high quality tiles.

Or consider the quality increase if you were to use this strategy:

  • The page is sized to 1024x1448 by your code.
  • You take a vew of the top at 1024x768, and a view of the bottom at 1024x768
  • 88 pixels of overlap between the two images give some commonality for the vision to join.
  • Those two images placed into the same user message.
  • Paying for two four tile images instead of one single tile image

The AI would have higher resolution text and more tokens of encoded image in general to contain information.

1 Like

OpenAI GPT-4 computer vision image pricing

Comparative breakdown by model

How much does it cost to send the same image to different models?

Here is detail:low (detail high can be up to 17 times greater cost)

Model 1M tokens ($) Tokens X Per 1k Cost Ratio
gpt-4o-2024-08-06 $2.50 1 $0.213 1
gpt-4o-mini-2024-07-18 $0.15 33.33 $0.425 2
gpt-4o-2024-05-13 $5.00 1 $0.425 2
gpt-4-turbo-2024-04-09 $10.00 1 $0.850 4
gpt-4-0125-preview $10.00 1 $0.850 4
gpt-4-1106-vision-preview $10.00 1 $0.850 4
  • gpt-4o-mini: The model-received image tokens are multiplied in reporting and billing
  • aliases that point to these actual model names are excluded