Why Responses API use more Tokens?


Assistants + Thread API vs Responses

6000 vs 25000

Isn’t that a lot of difference?

Both 4o-mini-latest

The difference is in the details.

Specifically, the image detail setting cost difference between high and low.

The price of sending images to gpt-4o-mini is inflated - multiplied - twice as expense in actual cost compared to gpt-4o.

Low detail: 85 tokens x 33.3.. = 2830 tokens

High detail: 25000 tokens / 33.33 => 750 tokens. That’s a tiles cost: 1 base image @ 85 + four “tiles” @ 170.

Rephased:

Why does setting the parameters incorrectly, not fully transformed into the different API object parameters required for vision user messages for Responses, change the cost?

Both options are set to high actually. Both 4o-mini. So thanks for your comment but you can see by trying the same example, I have the same problem in all 3 of my accounts.

Chat Completions, gpt-4o-mini

Responses, gpt-4o-mini

128x64 image, 364 bytes = 8508 tokens when you let “auto”/default do its thing, delivering “high” and expense - regardless of any promise that “auto” is anything but always “high” since day 1. And the Prompts playground giving you no detail option to choose.

BTW, 8500 is a perfect 85 token base and 170 token tile, which is 33.33x multiplier and 85 x 3

So it is pretty much impossible to be billed less than 8500 tokens for a single image without setting detail: low.

It could be that Assistants actually puts some “detail” intelligence in front of the real API call it uses. Or is giving you cheap pictures or low detail explaining poor quality vision.

One would have to care about Assistants to report an issue with underbilling or under-seeing.