Gpt-image-1.5 Images API returns text output tokens - can we disable this?

I’m using the new gpt-image-1.5 model with the Images API (/v1/images/generations) and noticed it now returns text output tokens in addition to image tokens.

My actual response usage:

Usage(
    input_tokens=48, 
    input_tokens_details=UsageInputTokensDetails(image_tokens=0, text_tokens=48), 
    output_tokens=481, 
    total_tokens=529, 
    output_tokens_details={
        'image_tokens': 272, 
        'text_tokens': 209  # <-- This is new in gpt-image-1.5
    }
)

With gpt-image-1, the Images API only returned image tokens in output_tokens. Now with gpt-image-1.5, I’m consistently seeing ~200+ text output tokens on every generation, even for simple prompts.

I couldn’t find any documentation explaining this behavior in the official guides:

Questions:

  1. What is the purpose of these text output tokens?

  2. Is this text output visible anywhere in the response, or is it internal only?

  3. Is there a way to disable this, or a parameter we can use to opt out?

For high-volume image generation use cases, this adds additional cost that wasn’t present with gpt-image-1.

Thanks!

2 Likes

I’m honestly not sure, but i can reply to a hypothesis on why it exists:

on chatgpt you generate an image and the text tokens are used in case the user talks about something that was done to the image.. that way, it can stick to just text instead of processing the image again…

2 Likes

Interesting thought, but I’m using the standalone Images API (/v1/images/generations), not the Responses API, which should be stateless call with no conversation context. So even if text tokens are generated, there’s no way to reference them in subsequent calls…

1 Like

e.g. responses endpoint → generations endpoint → loop

1 Like

Here’s why you are seeing what you are seeing: this model has a new category to bill you for generated text:

Columns for [ input text, cached input, output text ]

I also note no “20% cheaper” there…

What is that text? It can only be inferred, apparently - reasoning tokens?:

GPT-image-1.5 has built-in reasoning and strong world knowledge. For example, when asked to generate a scene set in Bethel, New York in August 1969, it can infer Woodstock and produce an accurate, context-appropriate image without being explicitly told about the event.

You now getting billed for the AI internally talking about denying your request? Probably no opt-out.

One penny for a thousand tokens. The API reference, with “The number of text output tokens generated by the model.”, has a minimizing response example of 10 tokens of text output instead of reality (to go along with the example having 40 image tokens instead of 6240).

The image generation documentation cost section has no mention of this additional cost where it should be, as you note.

1 Like

ImagesResponse(created=1765960791, background='opaque', data=[Image(b64_json='iVBORw0KGgoAAAAN..........AAAAElFTkSuQmCC', revised_prompt=None, url=None)], output_format='png', quality='medium', size='1024x1024', usage=Usage(input_tokens=114, input_tokens_details=UsageInputTokensDetails(image_tokens=0, text_tokens=114), output_tokens=1468, total_tokens=1582, output_tokens_details={'image_tokens': 1056, 'text_tokens': 412}))

I tried checking the entire response too, and looks like there is no visible “output” texts as well. Also noticed that the token count varies per request (209-412 in my tests) with no apparent pattern.

Therefore, it seems likely that gpt-image-1.5 charges for text output tokens that don’t exist anywhere in the API response (probably reasoning token as we are suspecting). For now, I will wait for official documentation to update on it.

Thanks for looking into it!

In this link it states that text output includes model reasoning tokens : https://openai.com/api/pricing/