Super-high token usage with gpt-4o-mini and image

I’m getting a ridiculously high token usage in the playground with gpt-4o-mini.

Here is the image:


This is the prompt:
You are a very professional image to text document extractor.
Please extract the text from this image.
A strikethrough is a horizontal line drawn through text, used to indicate the deletion of an error or the removal of text. Ensure that all strikethrough text is excluded from the output.
Try to format any tables found in the image.
Do not include page numbers, page headers, or page footers.
Please double-check to make sure that any words in all capitalized letters with strikethrough letters are excluded.
Return only the extracted text. No commentary.

Exclude Strikethrough: Do not include any strikethrough words in the output. Even if the strikethrough words are in a title.
Include Tables: Tables should be preserved in the extracted text.
Exclude Page Headers, Page Footers, and Page Numbers: Eliminate these elements which are typically not part of the main content.

This is the output:

14,734 tokens. That’s nuts!

The OpenAI tokenizer reports an output of only 130 tokens.

Is the processing of this one small image costing me over 10,000 tokens?

2 Likes

Intended. See:

2 Likes

OMG! That’s terrible! I just confirmed the cost through the API:

Array
(
[id] => chatcmpl-9myEamJ1m3AGLWi9FuqJtBQItE5Fi
[object] => chat.completion
[created] => 1721458312
[model] => gpt-4o-mini-2024-07-18
[choices] => Array
(
[0] => Array
(
[index] => 0
[message] => Array
(
[role] => assistant
[content] => 10. Sick Leave

Modify Article 9 of the Local #161 Motion Picture Theatrical and TV Series Production Agreement (and make conforming changes to Article 41 of the Local #161 Supplemental Digital Agreement) as follows:

"ARTICLE 9. SICK LEAVE

“(a) Paid Sick Leave in the State of New York: The following is applicable only to employees working under this Agreement in the State of New York:

“(1) Commencing employees shall accrue one (1)
)

                [logprobs] => 
                [finish_reason] => stop
            )

    )

[usage] => Array
    (
        [prompt_tokens] => 14378
        [completion_tokens] => 101
        [total_tokens] => 14479
    )

[system_fingerprint] => fp_611b667b19

)

True that!

While the cost and speed is great, the performance for some tasks (like the one above) is far inferior.

But, this is insane! At 14,000 tokens an image, I can only upload less than 10 pages before I exceed the 128K token limit. We’re not talking images of cats or ducks, but documents. You know, like they use in business?

Come on OpenAI, say it ain’t true!

2 Likes

You could separate the task into two separate calls, one for image-to-text, and a second for reasoning, for best of both worlds. That’s a little over double the complexity, but like a lot of engineering choices, it comes down to money vs. time.

2 Likes

Here are quite many topics about mini and images. Always explanation is same: working with images it costs same moneywise, so they use more tokens. Only text is cheaper.

Otherwise — mini is a light version of 4o.

But, this is the first call:

For doing nothing more than extracting the text from this image:
image

14,000 tokens charged for 600 characters of text?

This is the request in the Playground: https://platform.openai.com/playground/chat?models=gpt-4o-mini&preset=preset-vjZ8QKzF1RaISBGbWkExyRNk

And, just to compare Apples to Apples:

Gemini Pro charges 601 tokens for the same exact prompt and image. I suspect Gemini Flash is even cheaper.

1 Like

Oh, I meant separate it into two calls, one of which is to Anthropic Claude or Google Gemini, not that somehow two separate calls to 4o-mini.

I did confirm that the 33x token “count” is only for billing purposes. I appended the same image at high quality (which is billed at ~35000 tokens) 5x to a single request just to see if it would complete, or if the API would throw an error due to exceeding the context limit.

ChatCompletion(id=‘chatcmpl-9n9JDWnMQvsftnHyjiCDzDz2DdH2E’, choices=[Choice(finish_reason=‘stop’, index=0, logprobs=None, message=ChatCompletionMessage(content=‘The images are the same.’, role=‘assistant’, function_call=None, tool_calls=None))], created=1721500883, model=‘gpt-4o-mini-2024-07-18’, object=‘chat.completion’, service_tier=None, system_fingerprint=‘fp_611b667b19’, usage=CompletionUsage(completion_tokens=6, prompt_tokens=184206, total_tokens=184212))

You can see that the API completes the call, and bills you 184k tokens, so at least that works. The same call to gpt-4o (non-mini) is about 5500 input tokens.

2 Likes

Hmmm… I didn’t think of using gpt-4o. Can we send images to gpt-4o using chat completion?

This is in the gpt-4o-mini cookbook under URL Image Processinig:

    messages=[
        {"role": "system", "content": "You are a helpful assistant that responds in Markdown. Help me with my math homework!"},
        {"role": "user", "content": [
            {"type": "text", "text": "What's the area of the triangle?"},
            {"type": "image_url", "image_url": {
                "url": "https://upload.wikimedia.org/wikipedia/commons/e/e2/The_Algebra_of_Mohammed_Ben_Musa_-_page_82b.png"}
            }
        ]}
    ],

Wait, I just decided to switch out gpt-4o-mini for gpt-4o as the model, and this is what I got:

Array
(
[id] => chatcmpl-9nCbYwR4MeF0Iw7U730luU9KLhLiY
[object] => chat.completion
[created] => 1721513552
[model] => gpt-4o-2024-05-13
[choices] => Array
(
[0] => Array
(
[index] => 0
[message] => Array
(
[role] => assistant
[content] => 10. ____ Sick Leave

Modify Article 9 of the Local #161 Motion Picture Theatrical and TV Series Production Agreement (and make conforming changes to Article 41 of the Local #161 Supplemental Digital Agreement) as follows:

“ARTICLE 9. SICK LEAVE

“(a) Paid Sick Leave in the State of New York: The following is applicable only to employees working under this Agreement in the State of New York:

“(1) Commencing [insert the date that is the first Sunday after 30 days following the AMPTP’s receipt of notice of ratification], employees shall accrue one (1)
)

                [logprobs] => 
                [finish_reason] => stop
            )

    )

[usage] => Array
    (
        [prompt_tokens] => 636
        [completion_tokens] => 128
        [total_tokens] => 764
    )

[system_fingerprint] => fp_18cc0f1fa0

)

So, under the circumstances, it makes more sense to use gpt-4o than gpt-4o-mini for image processing ONLY because 10 pages of this very same image in mini exceeds the 128K token input limit while I can get over 200 pages of that same image processed with gpt-4o.

But now, we’re back to the gpt-4o 4K output token limit.

Sigh…

1 Like

The 33x input token multiplier doesn’t appear to actually apply for context limit purposes. The model handles the “184k” input just fine without raising an error. I believe it only “sees” the ~5500 tokens for 5 high-res images.

2 Likes

Thanks for looking into it. I’ll try it out. I am looking to process 20, 30 possibly 50 images at a time. If images are priced at gpt-4o levels, I guess that’s OK so long it doesn’t affect the output tokens dramatically. That will work for me.

This is what I ended up doing to get around the 4K token limit:

1. convert local pdf to jpg pages

2. upload jpg images to AWS s3 bucket

3. submit jpg images with prompt to OpenAI model in batches

4. continue processing if max tokens exceeded.

5. write output to local txt file

Doesn’t lower the token cost, but does allow me to efficiently process larger documents when the output exceeds 4K tokens.

1 Like