Responses API Image Generation Token Usage

uiidevs · June 4, 2025, 12:06am

I’m testing out the Responses API for the first time and I can’t find any information on token input/output usage when using the image generation tool.

Model: gpt-4.1-mini-2025-04-14
Input: {role: 'user', content: 'Create a simple icon for GPT-4.1'}

revised_prompt: 'A simple and modern icon representing GPT-4.1, featuring the text "GPT-4.1" in a sleek, futuristic font. The icon should have a clean design 
with a blue and white color scheme, incorporating subtle tech elements like circuit lines or digital nodes around the text. The background should be plain or gradient for a professional look.'

output_text: 'Here is a simple and modern icon for GPT-4.1 featuring a sleek design with circuit lines and a blue gradient background. Let me know if you want any adjustments!'

Image Output: {
  quality: 'medium',
  size: '1024x1024'
}

usage: {
    input_tokens: 2294,
    input_tokens_details: { cached_tokens: 0 },
    output_tokens: 119,
    output_tokens_details: { reasoning_tokens: 0 },
    total_tokens: 2413
}

This doesn’t line up with the docs:

So the final cost is the sum of:

input text tokens

input image tokens if using the edits endpoint

image output tokens

But I assume that’s specifically for the Image API.

It’s important for us to be able to track usage per user when calling the API. I was hoping for more detailed usage stats when it comes to image inputs/outputs. It seems the API doesn’t give detailed info on any multi-modal usage yet?

My best guess is that the usage shows original text and tool input tokens, plus the generated image and text being fed back into the text model. The revised prompt and text output are close enough to the output tokens.

Is the best bet in the meantime to estimate the gpt-image-1 usage, and determine how many input tokens it used for processing the image?

_j · June 4, 2025, 12:33am

I have made a topic about this, highlighting the non-disclosure of how the technology actually works - not an independent tool with a prompt, but rather, receiving in context from the chat with past images, that also cost in vision that continues.

Thing is there: the image tokens of the gpt-image-1 model are not what you are seeing in the usage, the image tool is billed at a different rate.

I hope to classify the usage and thresholds a bit more, but such work could be made obsolete by proper documentation, rather than “try this out…and pay”.

Hot tip:

Save some vision expense when passing in images: resize so the shorter dimension is 512 pixels, or that the longer dimension is maximum 1024 pixels. The first is the maximum internal resize to the image creation model, the second can save you expense of not going over two “tiles” of 512px on recurring vision.

Estimating basics

A normal “hello” call with lowest max_output_tokens allowed, 16.

['Hello! How can I assist you today?']
{
  "input_tokens": 10,
  "input_tokens_details": {
    "cached_tokens": 0
  },
  "output_tokens": 10,
  "output_tokens_details": {
    "reasoning_tokens": 0
  },
  "total_tokens": 20
}

With the addition of tools=[{"type": "image_generation"}], the increase per API iteration just from the tool specification language added:

{
  "input_tokens": 265,
  "input_tokens_details": {
    "cached_tokens": 0
  },
  "output_tokens": 11,
  "output_tokens_details": {
    "reasoning_tokens": 0
  },
  "total_tokens": 276
}

With tool invocation, you pay for the input (at least) twice, the second time, with the output billed as input, the new tool response, and what the AI again writes.

The absolute minimum-cost image - quality:low, size:1024x1024, 14-token prompt only.

The cost for NOT the image, just gpt-4o.

“Create the OpenAI logo. Just say ‘Done’ when complete.”

['Done.']
{
  "input_tokens": 674,
  "input_tokens_details": {
    "cached_tokens": 0
  },
  "output_tokens": 33,
  "output_tokens_details": {
    "reasoning_tokens": 0
  },
  "total_tokens": 707
}

Useless Usage

This API response object does not report image tokens, and I’m making direct RESTful calls to get usage unfiltered.

The tool defaults to auto quality (if you can imagine) and auto size, giving unpredictable costs.

Then one has to separately obtain, many clicks into the Usage page (with the legacy usage now gone) independently looking at seven different categories of “cached input tokens”, “model requests”.. under chat completions, to only see the input tokens anywhere.

gpt-image-1-2025-04-23
input: 23 tokens (reflects receiving the 33 output minus “Done.”)
output: (should be $0.01088 = 272 tokens, not yet in usage)

Resulting in

There is no finding image output cost anywhere other than money “total spend”, and nowhere in tokens.

albertleao · November 18, 2025, 6:14am

Bumping this thread because there really should be an easier way for this usage to be calculated.

Dmitriy_Alergant · December 12, 2025, 4:55am

Bumping this thread.

OpenAI must respect the “Usage” object and report:

*All text input consumed (including handover between models)
All text output consumed (including for handover between models, e.g. gpt-5 writing a prompt for gpt-5)
Image input (vision) tokens with breakdown by model
Image generation tokens, with breakdown by model (e.g. gpt-image-1, gpt-image-1-mini).
Web Search tool count
Code Execution container seconds (unless the container is durable and persists)

It is a shame that Usage object does not allow correctly calculating the full request cost.*

Topic		Replies	Views
How is pricing calculated when using /v1/responses with gpt-image-1? API gpt-image-1 , responses-api	6	218	December 21, 2025
How to calculate cost for gpt-image-1 when using the Responses API? API	1	236	November 12, 2025
How Much Does OpenAI Image Editing Cost Compared to New Image Generation? API image-generation	13	3167	June 1, 2025
Clarification on Token Usage for Image Inputs API api	4	304	September 10, 2025
Yet another gpt-image-1 pricing issue API chatgpt , api , pricing , gpt-image-1	0	116	September 10, 2025

Responses API Image Generation Token Usage

Hot tip:

Estimating basics

Useless Usage

Related topics