Responses API: How to identify the exact underlying image generation model for precise internal billing?

I’m currently working on a SaaS application where I need to implement a precise internal billing system, deducting credits from my users based on their exact token consumption.

To get accurate usage metrics (especially to track prompt_cache_hit_tokens for input and exact completion_tokens for output), I migrated from the standard Image API (images/generations) to the new Responses API.

The integration works flawlessly, but I’ve hit a roadblock regarding cost calculation.

The documentation states: “The Responses API image generation tool uses its own GPT Image model selection.” While the Response payload correctly provides the token counts in the usage object, it doesn’t seem to expose the name of the underlying image model that was actually used.

Since output token pricing varies drastically between models, multiplying the completion_tokens by the right price is impossible without knowing the exact model.

My questions are:

  1. Is there a strict, documented mapping between the driver LLM and the image model? For instance, is it guaranteed that gpt-5-mini will always trigger gpt-image-1-mini, and gpt-5.5 will always trigger gpt-image-2?

  2. Is there a way to extract the exact image model name directly from the Response object payload? (I checked the SDK and couldn’t find a parameter for it).

  3. If this is currently a “black box”, are there any plans to expose the underlying image model in the usage block or metadata in future updates? Precise cost attribution is vital for developers building user-facing applications.

Thanks in advance for any insights or official clarifications!

The API reference for tools gives you that “auto” model selection is your choice, and you can select the specific model that shall be used.

It also states that gpt-image-1 is the default model, but the API reference does not yet even include gpt-image-2.

Costs are another matter entirely. An ImageGenerationCall output item or event does not provide any usage or model information, nor billing, nor the cost of image inputs that were used as image generation context automatically from the chat. Billing is at the image model’s cost, so wouldn’t be in “usage details”. By “chat” with an image model, you are basically not caring that a response could cost you $1.00+ if this tool is invoked by the AI, as there is obfuscated information about how it even works for collecting billable input context.

The API reference documentation page is a formatting mess currently. Here are parameters that can be passed to “tools” when you include image_generation tool, alphabetically, in an unfortunate wide table on this forum that can’t go wide, with a schema that is also reused for response output echo (in case some don’t seem like inputs).


Image Generation Tool

The image generation tool creates new images or edits existing images using GPT image models.

Use this tool by including a configuration with:

{
  "type": "image_generation"
}

Additional parameters may be provided to control the model, image size, quality, background, output format, editing behavior, and streaming previews.


Basic Example

{
  "type": "image_generation",
  "model": "gpt-image-1.5",
  "action": "generate",
  "size": "1024x1024",
  "quality": "high",
  "background": "auto",
  "output_format": "png"
}

Parameter Reference

Parameter Required Accepted Values Default Description
type Yes "image_generation" Identifies this as the image generation tool. This value must always be "image_generation".
action No "generate", "edit", "auto" "auto" Controls whether the tool should create a new image, edit an existing image, or automatically choose the appropriate behavior.
background No "transparent", "opaque", "auto" "auto" Controls whether the generated image should have a transparent background, an opaque background, or automatic background handling.
input_fidelity No "high", "low" "low" Controls how closely the output should preserve style, identity, and visual details from input images. Especially relevant for facial features and image edits.
input_image_mask No See Input Image Mask Provides a mask image for inpainting or targeted image editing.
model No "gpt-image-1", "gpt-image-1-mini", "gpt-image-1.5", or another supported model name as a string "gpt-image-1" Selects the image generation model.
moderation No "auto", "low" "auto" Controls the moderation strictness applied to generated images.
output_compression No Number 100 Controls output image compression. Mainly relevant for compressed formats such as JPEG or WebP.
output_format No "png", "webp", "jpeg" "png" Sets the file format for the generated image.
partial_images No Number from 0 to 3 0 Controls how many partial image previews are produced while streaming. Use 0 to disable partial images.
quality No "low", "medium", "high", "auto" "auto" Controls the quality level of the generated image. Higher quality may increase generation time or cost.
size No "1024x1024", "1024x1536", "1536x1024", "auto" "auto" Sets the output image dimensions. Use "auto" to let the system choose.

Parameters in Detail

type

Identifies the tool configuration as an image generation request.

Required value:

"type": "image_generation"

This parameter is required.


action

Controls whether the tool generates a new image, edits an existing image, or decides automatically.

Accepted values:

Value Meaning
"generate" Create a new image from the prompt or instructions.
"edit" Modify an existing input image.
"auto" Let the system choose between generation and editing behavior.

Default:

"auto"

background

Controls the background style of the generated image.

Accepted values:

Value Meaning
"transparent" Generate an image with transparency where supported.
"opaque" Generate an image with a solid, non-transparent background.
"auto" Let the system choose the appropriate background handling.

Default:

"auto"

input_fidelity

Controls how closely the output should preserve details from supplied input images.

Accepted values:

Value Meaning
"high" Stronger preservation of input image details, style, and features. Useful when editing faces, likenesses, or specific visual identities.
"low" Looser preservation of input details. Allows more variation from the input image.

Default:

"low"

Model support:

  • Supported by gpt-image-1
  • Supported by gpt-image-1.5
  • Not supported by gpt-image-1-mini

input_image_mask

Provides a mask image for inpainting or targeted editing.

The mask can be supplied either as a file ID or as a base64-encoded image string.

Example using a file ID:

{
  "input_image_mask": {
    "file_id": "file_abc123"
  }
}

Example using a base64-encoded image:

{
  "input_image_mask": {
    "image_url": "data:image/png;base64,..."
  }
}

Subfields:

Field Required Accepted Values Description
file_id No String ID of a previously uploaded mask image file.
image_url No String Base64-encoded mask image.

Notes:

  • Use input_image_mask when only part of an image should be edited.
  • The mask identifies the area to modify during inpainting.
  • At least one of file_id or image_url should be provided when using a mask.

model

Selects the image generation model.

Accepted values include:

Value Description
"gpt-image-1" Default image generation model.
"gpt-image-1-mini" Smaller image generation model. Some advanced features may not be supported.
"gpt-image-1.5" Newer image generation model with support for advanced image features.
Any other supported model name as a string Allows specifying another compatible image model if available.

Default:

"gpt-image-1"

moderation

Controls the moderation level used for image generation.

Accepted values:

Value Meaning
"auto" Use the default moderation behavior.
"low" Use a lower moderation level where available.

Default:

"auto"

output_compression

Controls the compression level of the output image.

Accepted value:

Number

Default:

100

Notes:

  • This is most relevant for compressed output formats such as "jpeg" and "webp".
  • Higher values generally mean less compression and higher image quality.
  • Lower values generally mean more compression and smaller file size.

output_format

Sets the output image file format.

Accepted values:

Value Meaning
"png" PNG image output. Useful for lossless images and transparency.
"webp" WebP image output. Useful for compressed web images.
"jpeg" JPEG image output. Useful for photographs and compressed images without transparency.

Default:

"png"

partial_images

Controls how many partial images are generated while streaming.

Accepted value:

0, 1, 2, or 3

Default:

0

Meaning:

Value Meaning
0 Do not generate partial image previews.
1 Generate one partial image preview.
2 Generate two partial image previews.
3 Generate three partial image previews.

quality

Controls the image quality level.

Accepted values:

Value Meaning
"low" Lower quality, generally faster or less expensive.
"medium" Balanced quality.
"high" Higher quality, generally slower or more expensive.
"auto" Let the system choose the appropriate quality level.

Default:

"auto"

size

Sets the output image dimensions.

Accepted values:

Value Orientation
"1024x1024" Square
"1024x1536" Portrait
"1536x1024" Landscape
"auto" Automatically selected

Default:

"auto"

Compact Example: Generate a Square PNG

{
  "type": "image_generation",
  "action": "generate",
  "model": "gpt-image-1",
  "size": "1024x1024",
  "quality": "auto",
  "output_format": "png"
}

Compact Example: Edit an Image With High Input Fidelity

{
  "type": "image_generation",
  "action": "edit",
  "model": "gpt-image-1.5",
  "input_fidelity": "high",
  "size": "1024x1024",
  "quality": "high",
  "output_format": "png"
}

Compact Example: Use a Mask for Inpainting

{
  "type": "image_generation",
  "action": "edit",
  "model": "gpt-image-1.5",
  "input_image_mask": {
    "file_id": "file_abc123"
  },
  "size": "1024x1024",
  "quality": "high",
  "output_format": "png"
}