API documentation for image generate has wrong response object

This is a targeted “patch” needed, scoped to generated API reference. I could write much more about shortcomings in describing how vision of edit input dimensions, input_fidelity dimensions and mask RGB+A layers actually work.

OpenAPI yaml specification

Issue: incorrect response objects within non-stream reference schemas. There are mismatched examples, both with “created”: 1713833628 indicating they should be the same request example, but differing in shape.

application/json response:
‘#/components/schemas/ImagesResponse’

x-oaiMeta:
  name: The image generation response
  group: images
  example: |
    {
      "created": 1713833628,
      "data": [
        {
          "b64_json": "..."
        }
      ],
      "background": "transparent",
      "output_format": "png",
      "size": "1024x1024",
      "quality": "high",
      "usage": {
        "total_tokens": 100,
        "input_tokens": 50,
        "output_tokens": 50,
        "input_tokens_details": {
          "text_tokens": 10,
          "image_tokens": 40
        }
      }
    }

However, the code-flavored response object, as seen adjacent to “create image” in the API reference, omits the echo for the model.

            response: |
              {
                "created": 1713833628,
                "data": [
                  {
                    "b64_json": "..."
                  }
                ],
                "usage": {
                  "total_tokens": 100,
                  "input_tokens": 50,
                  "output_tokens": 50,
                  "input_tokens_details": {
                    "text_tokens": 10,
                    "image_tokens": 40
                  }
                }
              }

Concerns:

  • missing all echoes in some API examples, where token usage inclusion indicates gpt-image was called.
  • doc vs response sort mismatch: doc is unsorted after required “created” appearing first;
  • dalle and “revised_prompt” or “url” blasted from examples

The token cost is also a work of fiction; four or five digit integers being billed are realistic.

Edits in the API reference page has no adjacent response object at all.

It seems spec yaml is constantly capped at 2MB; good luck.

Actual response shapes being returned

=== gpt-image-1.5 | Images API response (edit) ===

{
  "created": 1767332319,
  "background": "opaque",
  "data": [
    {
      "b64_json": "<1.86MB b64>"
    }
  ],
  "output_format": "png",
  "quality": "medium",
  "size": "1024x1024",
  "usage": {
    "input_tokens": 24165,
    "input_tokens_details": {
      "image_tokens": 24043,
      "text_tokens": 122
    },
    "output_tokens": 1504,
    "output_tokens_details": {
      "image_tokens": 1056,
      "text_tokens": 448
    },
    "total_tokens": 25669
  }
}

=== gpt-image-1-mini | Images API response (generate) ===

{
  "created": 1767358194,
  "background": "opaque",
  "data": [
    {
      "b64_json": "<2.63MB b64>"
    }
  ],
  "output_format": "png",
  "quality": "medium",
  "size": "1536x1024",
  "usage": {
    "input_tokens": 24,
    "input_tokens_details": {
      "image_tokens": 0,
      "text_tokens": 24
    },
    "output_tokens": 1568,
    "total_tokens": 1592
  }
}

Useful for developers reading.

Try out a gpt-image-x only response schema as validation (validation that an image was returned too), or as communication to an AI.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.local/schemas/openai.images.gpt-image.response.schema.json",
  "title": "OpenAI Images API response (GPT image models)",
  "type": "object",
  "required": ["created", "data"],
  "additionalProperties": true,
  "properties": {
    "created": {
      "type": "integer",
      "description": "Unix timestamp (seconds)."
    },
    "data": {
      "type": "array",
      "minItems": 1,
      "items": {
        "type": "object",
        "required": ["b64_json"],
        "additionalProperties": true,
        "properties": {
          "b64_json": {
            "type": "string",
            "description": "Base64-encoded image bytes (PNG/JPEG/WebP depending on output_format)."
          }
        }
      }
    },

    "background": {
      "type": "string",
      "description": "Echoed background mode.",
      "enum": ["transparent", "opaque", "auto"]
    },
    "output_format": {
      "type": "string",
      "description": "Echoed or employed output file format.",
      "enum": ["png", "jpeg", "webp"]
    },
    "size": {
      "type": "string",
      "description": "Echoed or employed output size."
    },
    "quality": {
      "type": "string",
      "description": "Echoed or selected quality.",
      "enum": ["low", "medium", "high", "auto"]
    },

    "usage": {
      "type": "object",
      "description": "Token usage (present for GPT image models; shape can vary by model).",
      "required": ["input_tokens", "output_tokens", "total_tokens"],
      "additionalProperties": true,
      "properties": {
        "input_tokens": { "type": "integer" },
        "output_tokens": { "type": "integer" },
        "total_tokens": { "type": "integer" },

        "input_tokens_details": {
          "type": "object",
          "additionalProperties": true,
          "required": ["text_tokens", "image_tokens"],
          "properties": {
            "text_tokens": { "type": "integer" },
            "image_tokens": { "type": "integer" }
          }
        },

        "output_tokens_details": {
          "type": "object",
          "additionalProperties": true,
          "description": "present on gpt-image-1.5 due to text reasoning.",
          "properties": {
            "text_tokens": { "type": "integer" },
            "image_tokens": { "type": "integer" }
          }
        }
      }
    }
  }
}