One seeming benefit of o3-pro seen - at least you aren’t the one paying for hundreds of tokens of unseen decision and moderation as in other reasoning models.
O3-Pro
36 output tokens billed on 26 tokens received.
Still with apparent 85 tokens for vision base tile instead of 75 tokens.
O1-Pro
193 more output tokens billed than received:
Note the peculiar vision input billing of o1-pro
, also seen in o1. An image 512x512 would be 1 tile (75 or 85 tokens) says the pricing guide. Here however, a detail:low image is always min/max 22 tokens with container overhead, and detail:high as showing is 41 tokens with its text. 512x513 is a jump to 63, 22 tokens more input. Perhaps a price break because of the stratospheric cost otherwise? At the very least, o1 is undisclosed and unpublished vision pricing formula.
Adding images has added latency of around 1-3 seconds across all other models. So still with these 15 second response times, there’s either a queue, there is unseen moderations or decisions before your billed task…or OpenAI figured out how to publish a model with 3 token-per-second generation rate.