Responses API Image Generation Token Usage

I have made a topic about this, highlighting the non-disclosure of how the technology actually works - not an independent tool with a prompt, but rather, receiving in context from the chat with past images, that also cost in vision that continues.

Thing is there: the image tokens of the gpt-image-1 model are not what you are seeing in the usage, the image tool is billed at a different rate.

I hope to classify the usage and thresholds a bit more, but such work could be made obsolete by proper documentation, rather than “try this out…and pay”.


Hot tip:

Save some vision expense when passing in images: resize so the shorter dimension is 512 pixels, or that the longer dimension is maximum 1024 pixels. The first is the maximum internal resize to the image creation model, the second can save you expense of not going over two “tiles” of 512px on recurring vision.


Estimating basics

A normal “hello” call with lowest max_output_tokens allowed, 16.

['Hello! How can I assist you today?']
{
  "input_tokens": 10,
  "input_tokens_details": {
    "cached_tokens": 0
  },
  "output_tokens": 10,
  "output_tokens_details": {
    "reasoning_tokens": 0
  },
  "total_tokens": 20
}

With the addition of tools=[{"type": "image_generation"}], the increase per API iteration just from the tool specification language added:

{
  "input_tokens": 265,
  "input_tokens_details": {
    "cached_tokens": 0
  },
  "output_tokens": 11,
  "output_tokens_details": {
    "reasoning_tokens": 0
  },
  "total_tokens": 276
}

With tool invocation, you pay for the input (at least) twice, the second time, with the output billed as input, the new tool response, and what the AI again writes.

The absolute minimum-cost image - quality:low, size:1024x1024, 14-token prompt only.

  • The cost for NOT the image, just gpt-4o.

“Create the OpenAI logo. Just say ‘Done’ when complete.”

['Done.']
{
  "input_tokens": 674,
  "input_tokens_details": {
    "cached_tokens": 0
  },
  "output_tokens": 33,
  "output_tokens_details": {
    "reasoning_tokens": 0
  },
  "total_tokens": 707
}

Useless Usage

This API response object does not report image tokens, and I’m making direct RESTful calls to get usage unfiltered.

The tool defaults to auto quality (if you can imagine) and auto size, giving unpredictable costs.

Then one has to separately obtain, many clicks into the Usage page (with the legacy usage now gone) independently looking at seven different categories of “cached input tokens”, “model requests”.. under chat completions, to only see the input tokens anywhere.

gpt-image-1-2025-04-23
input: 23 tokens (reflects receiving the 33 output minus “Done.”)
output: (should be $0.01088 = 272 tokens, not yet in usage)

Resulting in

There is no finding image output cost anywhere other than money “total spend”, and nowhere in tokens.