Hi everyone,
I’m using the Responses API with the image_generation tool to generate images through the gpt-image-1 model.
I’d like to programmatically calculate the cost of each request, but I noticed that the result.usage field does not include any image tokens or image generation cost.
**What I observed
-** The streamed response includes usage for text tokens from gpt-4.1,
but no information about the image generation cost or tokens from gpt-image-1.
There’s no field in the response or event stream that reports how much image generation costs.
Questions
How can I retrieve or estimate the image generation cost for each API call?
(e.g. via a billing endpoint, usage API, or a formula)
When using the image_generation tool, is the cost composed of (a) text token usage from gpt-4.1 and (b) image generation cost from gpt-image-1?
You cannot programmatically calculate the cost, because the AI on Responses has a tool iterator it can call multiple times in a loop, the input context length can grow and be re-billed each iteration turn, and the user input can make the AI produce no pictures or a half-dozen. You get billed for the vision of the images you supply to the chat model itself, which can be repeatedly.
Then, you also cannot determine the costs, because in a completely non-described manner, a lengthy or full input of the same chat is scraped out and passed to the image model so that a tool call is not merely passing a “prompt”, it is passing the image creation model a full context of the chat also - all images, all text, likely up to what gpt-4o can handle as input. The chat AI doesn’t have to send any prompt language at all in its tool call to have an image generated, and the proof of the context loading and double-billing for images into a second AI model is demonstrated in how far back your chat can still say, “edit the first image again.”
Yes, the image model being called has its own input/vision fees per input, also.
It is completely unhinged in a “trust me” opaque manner.
Just adding the image tool is a huge chunk of billed tokens from the description.
Here’s where at least for generations and edits endpoint, I can give you answers.