Thanks for the update on the o3 model pricing — the 80% reduction is a welcome change.
However, I wanted to flag a potential billing issue I encountered. I ran a batch using o3-2025-04-16 shortly after (June 10, 2025 at 23:03 UTC) the new pricing was officially announced as “now in effect” (June 10, 2025 at 21:18 UTC), but I was still charged the old rates — $5/million for input and $20/million for output — totaling $150.00 for the batch.
I tried to contact the support team, but I was only able to reach the automated Operator and couldn’t find a way to submit a formal request about the issue.
Hm! If you send a message at the bottom of https://help.openai.com/en/?q=contact while logged in, and create a case with our Support team, they’ll be able to take a look into your account. (Unfortunately in this forum we’re not able to investigate individual account issues here.)
The input token consumption of vision does not match between o3 and o3 pro. Token usage is higher, besides the higher price.
67x80 image
Sending nothing other than the image:
O3-Pro: 262 tokens
O3: 232 tokens
The extra 10 tokens x3 shows that there is 85 token image billing per base tile like other mainline models, not the 75 of o3 (or even 65 base + 129/tile of computer-use-preview or gpt-image-1)
A simple input of one image to o3 pro and a small token billing was an extensive wait.
I suspect that “no flex processing discount” is because that is already being done by “inference efficiencies” also coming to this model: if you’re willing to wait for a long response, you’ll also be waiting behind other jobs transparently for fitting the API call into a queue…
One seeming benefit of o3-pro seen - at least you aren’t the one paying for hundreds of tokens of unseen decision and moderation as in other reasoning models.
Note the peculiar vision input billing of o1-pro, also seen in o1. An image 512x512 would be 1 tile (75 or 85 tokens) says the pricing guide. Here however, a detail:low image is always min/max 22 tokens with container overhead, and detail:high as showing is 41 tokens with its text. 512x513 is a jump to 63, 22 tokens more input. Perhaps a price break because of the stratospheric cost otherwise? At the very least, o1 is undisclosed and unpublished vision pricing formula.
Adding images has added latency of around 1-3 seconds across all other models. So still with these 15 second response times, there’s either a queue, there is unseen moderations or decisions before your billed task…or OpenAI figured out how to publish a model with 3 token-per-second generation rate.