How is pricing calculated when using /v1/responses with gpt-image-1?

Hi, when I use /v1/responses endpoint to call gpt-image-1 tools, how can I calculate the exact cost for my request?

As my mind, there should be both TEXT cost and IMAGE cost, but the response json only shows the usage with input and output tokens, no mentioned img.

As follows.

Request Json
{
    "model": "gpt-4.1-mini",
    "input": "Generate an image of gray tabby cat hugging an otter with an orange scarf",
    "tools": [
        {
            "type": "image_generation"
        }
    ]
}
Response Json
{
  "id": "resp_0180d3b13d71b7a60068c5895175fc819682fdc45be47e8c3a",
  "object": "response",
  "created_at": 1757776209,
  "status": "completed",
  "background": false,
  "error": null,
  "incomplete_details": null,
  "instructions": null,
  "max_output_tokens": null,
  "max_tool_calls": null,
  "model": "gpt-4.1-mini-2025-04-14",
  "output": [
    {
      "id": "ig_0180d3b13d71b7a60068c589526ff88196987aebdd12e1230d",
      "type": "image_generation_call",
      "status": "completed",
      "background": "opaque",
      "output_format": "png",
      "quality": "high",
      "result": "[base64-encoded image data]",
      "revised_prompt": "A gray tabby cat hugging an otter. The otter is wearing a bright orange scarf. The scene is cute and heartwarming, with both animals showing a friendly and affectionate gesture. The background is simple and soft to highlight the animals.",
      "size": "1024x1024"
    },
    {
      "id": "msg_0180d3b13d71b7a60068c5897bdc9c81968b01b524128bfab0",
      "type": "message",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "annotations": [],
          "logprobs": [],
          "text": "Here is an image of a gray tabby cat hugging an otter wearing an orange scarf. If you need any changes or another image, feel free to ask!"
        }
      ],
      "role": "assistant"
    }
  ],
  "parallel_tool_calls": true,
  "previous_response_id": null,
  "prompt_cache_key": null,
  "reasoning": {
    "effort": null,
    "summary": null
  },
  "safety_identifier": null,
  "service_tier": "default",
  "store": true,
  "temperature": 1.0,
  "text": {
    "format": {
      "type": "text"
    },
    "verbosity": "medium"
  },
  "tool_choice": "auto",
  "tools": [
    {
      "type": "image_generation",
      "background": "auto",
      "moderation": "auto",
      "n": 1,
      "output_compression": 100,
      "output_format": "png",
      "quality": "auto",
      "size": "auto"
    }
  ],
  "top_logprobs": 0,
  "top_p": 1.0,
  "truncation": "disabled",
  "usage": {
    "input_tokens": 2285,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 96,
    "output_tokens_details": {
      "reasoning_tokens": 0
    },
    "total_tokens": 2381
  },
  "user": null,
  "metadata": {}
}

As we can see, the usage.input_tokens is 2285 and output_tokens is 96. There is an image made by gpt-image-1, but no info about that cost. So, how can I do?

I would appreciate any kind reply, best regards!

You cannot. The AI may do any number of things based on an input, and the context of images sent to the image model is undocumented.

“I’m sorry, but I can’t make that image” is relatively cheap.

1 Like

Wouldn’t it be simple for the output image token count to just be added to the image_generation_call tool output? Seems like that would be the most straightforward solution

Great idea!

Like were you to make an API call to one of the new models via an images endpoint:

{'model': 'chatgpt-image-latest', 'prompt': 'If tuna fish could talk', 'size': '1024x1024', 'timeout': 240, 'user': 'image-editor-user', 'output_format': 'png', 'quality': 'medium', 'background': 'opaque'}

Then in the JSON that contains the b64_data of your image, you might also have “usage”?

{'input_tokens': 11, 'input_tokens_details': {'image_tokens': 0, 'text_tokens': 11}, 'output_tokens': 1470, 'total_tokens': 1481, 'output_tokens_details': {'image_tokens': 1056, 'text_tokens': 414}}

See the API reference image generation response object and see if that meets your needs: https://platform.openai.com/docs/api-reference/images/object

Nothing to observe if you use OpenAI’s hosted Responses API tools, even within response.image_generation_call.completed or any “usage” tokens outside of chat model context - but nothing stops you from making your own function for image generation, also controlling what goes “in” and those input image costs taken from the entire conversation’s “vision”. Then you might even have a bit of control over the user prompting and jailbreaking your app to make a dozen images.

@_j for proxies like LiteLLM, it would just be great if this info could be added to the managed tool output. For example, if you use Google’s Nano Banana models for multimodal output, it does include output image token count, and so the proxy (or whatever downstream application) is able to monitor costs per API-request.

Having to build a separate custom function / tool defeats the purpose of the OpenAI / Azure OpenAI provided tools and makes multimodal output much more challenging.

1 Like

And therein lies the difference: OpenAI does not let you “chat” with an AI model that can natively generate images. On the images endpoint, your “prompt” text is likely still containerized in a task-based message of what the AI is supposed to do, besides image-only fine-tuning.

An image tool report would need a more encompassing internal collector of costs, and perhaps even a “tool usage” object not envisioned yet (which could also report directly on file search fees, auto code containers, etc).

@_j not sure what you’re arguing here. The underlying “tool” is just calling gpt-image-*, which does enumerate and calculate image token input and output, therefore it could quite easily return usage along with the tool output. Just in how it is returning plenty of other tool-specific information just like "background": "opaque", "output_format": "png" why not also usage? It seems like a simple api output update for OpenAI to make.

1 Like