Gpt-image-2 output pricing calculator - online.. and in your code!

OpenAI made this opaque. I’m pleased to present:

Interactive image output token and cost calculator, much easier than OpenAI’s calculator (if you can find it, that likes to say “invalid” after your input).

That’s based the algorithm behind cost calculations for gpt-image-2. If you like JavaScript, steal away there.

Simple code

Bringing this forum around to “stuff for developers” instead of "complaints from developers, here’s the money shot: Python code for you to compute token output from quality and size as inputs (non-validating)

IMAGE_MODEL_SPECS = {
    "gpt-image-2": {
        "size_limits": {
            "step_px": 16,
            "min_pixels": 655_360,
            "max_pixels": 8_294_400,
            "max_dimension_px": 3_840,
            "max_aspect_ratio": 3.0,
        },
        "quality_axis_factors": {"low": 16, "medium": 48, "high": 96},
        "token_area_offset_pixels": 2_000_000,
        "token_area_scale_denominator": 4_000_000,
        "image_output_price_per_million_tokens": 30.00,
    },
}

def calculate_image_tokens(quality, width, height, model="gpt-image-2"):
    spec = IMAGE_MODEL_SPECS[model]
    quality_axis_factor = spec["quality_axis_factors"].get(quality)
    if quality_axis_factor is None:
        allowed = "', '".join(spec["quality_axis_factors"])
        raise ValueError(f"quality must be one of '{allowed}'; got {quality!r}")

    long_edge = max(width, height)
    short_edge = min(width, height)
    short_axis_factor = (
        2 * quality_axis_factor * short_edge + long_edge
    ) // (2 * long_edge)

    return (
        quality_axis_factor
        * short_axis_factor
        * (spec["token_area_offset_pixels"] + width * height)
        + spec["token_area_scale_denominator"]
        - 1
    ) // spec["token_area_scale_denominator"]

Code with utilities included

  • Validate image size: return error message about the input limit “rules” violated, or use no messages as “success”
  • Normalize: re-shape a desired resolution into one that works
  • Recommend cheaper: in step bands, there are cheaper resolutions with a more rectangular aspect ratio to be found - we find them and recommend the size string!
  • Tokens to dollars: because math is hard; this consumes from a model’s truth (and possible future models)

Includes a demo so you can exercise these all:

GPT image size helpers demo
Enter sizes like 1200x1600. Blank input exits.

Size: *333x355*
333x355 is invalid for gpt-image-2:
- Width and height must both be divisible by 16.
- Pixel budget must be at least 655,360 pixels, inclusive.
We can fix that in code, though!
Normalized: 333x355 -> 784x848
-- costs for 784x848 --
low: 160 tokens, $0.004800
  cheaper larger is 784x880: 151 tokens, $0.004530
medium: 1,408 tokens, $0.042240
  cheaper larger is 784x880: 1,388 tokens, $0.041640
high: 5,693 tokens, $0.170790
  cheaper larger is 784x864: 5,591 tokens, $0.167730

Python helper utilities

from math import ceil, floor


IMAGE_MODEL_SPECS = {
    "gpt-image-2": {
        "size_limits": {
            "step_px": 16,
            "min_pixels": 655_360,
            "max_pixels": 8_294_400,
            "max_dimension_px": 3_840,
            "max_aspect_ratio": 3.0,
        },
        "quality_axis_factors": {"low": 16, "medium": 48, "high": 96},
        "token_area_offset_pixels": 2_000_000,
        "token_area_scale_denominator": 4_000_000,
        "image_output_price_per_million_tokens": 30.00,
    },
}

def calculate_image_tokens(quality, width, height, model="gpt-image-2"):
    spec = IMAGE_MODEL_SPECS[model]
    quality_axis_factor = spec["quality_axis_factors"].get(quality)
    if quality_axis_factor is None:
        allowed = "', '".join(spec["quality_axis_factors"])
        raise ValueError(f"quality must be one of '{allowed}'; got {quality!r}")

    long_edge = max(width, height)
    short_edge = min(width, height)
    short_axis_factor = (
        2 * quality_axis_factor * short_edge + long_edge
    ) // (2 * long_edge)

    return (
        quality_axis_factor
        * short_axis_factor
        * (spec["token_area_offset_pixels"] + width * height)
        + spec["token_area_scale_denominator"]
        - 1
    ) // spec["token_area_scale_denominator"]


def validate_image_size(width, height, model="gpt-image-2"):
    limits = IMAGE_MODEL_SPECS[model]["size_limits"]
    step = limits["step_px"]
    min_pixels = limits["min_pixels"]
    max_pixels = limits["max_pixels"]
    max_dimension = limits["max_dimension_px"]
    max_ratio = limits["max_aspect_ratio"]

    if type(width) is not int or type(height) is not int or width <= 0 or height <= 0:
        return ["Enter whole-number width and height values greater than 0."]

    pixels = width * height
    long_edge = max(width, height)
    short_edge = min(width, height)
    errors = []

    if width % step != 0 or height % step != 0:
        errors.append(f"Width and height must both be divisible by {step}.")
    if pixels > max_pixels:
        errors.append(
            f"Pixel budget must be no greater than {max_pixels:,} pixels, inclusive."
        )
    if pixels < min_pixels:
        errors.append(
            f"Pixel budget must be at least {min_pixels:,} pixels, inclusive."
        )
    if long_edge > max_dimension:
        errors.append(
            f"Maximum edge length must be less than or equal to {max_dimension:,}px."
        )
    if long_edge > max_ratio * short_edge:
        errors.append(f"Aspect ratio must be no greater than {max_ratio:g}:1.")

    return errors


def normalize(width, height, model="gpt-image-2"):
    limits = IMAGE_MODEL_SPECS[model]["size_limits"]
    step = limits["step_px"]
    min_area = ceil(limits["min_pixels"] / (step * step))
    max_area = floor(limits["max_pixels"] / (step * step))
    max_side = floor(limits["max_dimension_px"] / step)
    max_ratio = float(limits["max_aspect_ratio"])

    width = max(1, int(width))
    height = max(1, int(height))
    ratio = max(1.0 / max_ratio, min(max_ratio, width / height))

    if ratio >= 1.0:
        max_area = min(max_area, max_side * max(1, floor(max_side / ratio)))
    else:
        max_area = min(max_area, max_side * max(1, floor(max_side * ratio)))

    pixels = width * height
    if pixels < min_area * step * step:
        area = min_area
    elif pixels > max_area * step * step:
        area = max_area
    else:
        area = pixels / (step * step)

    target_w = (area * ratio) ** 0.5
    target_h = (area / ratio) ** 0.5
    choices = []

    for h in {floor(target_h) - 1, floor(target_h), ceil(target_h), ceil(target_h) + 1}:
        if 1 <= h <= max_side:
            lo = max(1, ceil(min_area / h), ceil(h / max_ratio))
            hi = min(max_side, floor(max_area / h), floor(h * max_ratio))
            if lo <= hi:
                w = min(hi, max(lo, round(ratio * h)))
                choices.append((w, h))

    for w in {floor(target_w) - 1, floor(target_w), ceil(target_w), ceil(target_w) + 1}:
        if 1 <= w <= max_side:
            lo = max(1, ceil(min_area / w), ceil(w / max_ratio))
            hi = min(max_side, floor(max_area / w), floor(w * max_ratio))
            if lo <= hi:
                h = min(hi, max(lo, round(w / ratio)))
                choices.append((w, h))

    best = min(
        choices,
        key=lambda size: (
            ((size[0] - target_w) / target_w) ** 2
            + ((size[1] - target_h) / target_h) ** 2,
            abs(size[0] * size[1] - area),
        ),
    )
    return best[0] * step, best[1] * step






def recommend_cheaper_larger_size(model, size, quality):
    if isinstance(size, str):
        width, height = map(int, size.lower().split()[0].split("x"))
    else:
        width, height = size

    if validate_image_size(width, height, model):
        return None

    spec = IMAGE_MODEL_SPECS[model]
    q = spec["quality_axis_factors"].get(quality)
    if q is None:
        allowed = "', '".join(spec["quality_axis_factors"])
        raise ValueError(f"quality must be one of '{allowed}'; got {quality!r}")

    limits = spec["size_limits"]
    step = limits["step_px"]
    max_dimension = (limits["max_dimension_px"] // step) * step
    long_side = max(width, height)
    short_side = min(width, height)
    grow_width = width >= height
    tokens = calculate_image_tokens(quality, width, height, model)

    if long_side > short_side:
        prev_long = long_side - step
        prev_size = (prev_long, short_side) if grow_width else (short_side, prev_long)
        if not validate_image_size(prev_size[0], prev_size[1], model):
            if calculate_image_tokens(quality, prev_size[0], prev_size[1], model) > tokens:
                return None

    max_long = min(
        max_dimension,
        (int(limits["max_aspect_ratio"] * short_side) // step) * step,
        (limits["max_pixels"] // short_side // step) * step,
    )
    band = (2 * q * short_side + long_side) // (2 * long_side)

    for next_band in range(band - 1, 0, -1):
        threshold = (2 * q * short_side) // (2 * next_band + 1) + 1
        candidate_long = max(long_side + step, ((threshold + step - 1) // step) * step)
        if candidate_long > max_long:
            return None

        candidate_size = (
            (candidate_long, short_side)
            if grow_width
            else (short_side, candidate_long)
        )
        if calculate_image_tokens(
            quality,
            candidate_size[0],
            candidate_size[1],
            model,
        ) < tokens:
            return f"{candidate_size[0]}x{candidate_size[1]}"

    return None

def image_tokens_to_dollars(tokens, model="gpt-image-2"):
    price_per_million = IMAGE_MODEL_SPECS[model]["image_output_price_per_million_tokens"]
    return tokens * price_per_million / 1_000_000

def demo():
    model = "gpt-image-2"
    qualities = ["low", "medium", "high"]

    print("GPT image size helper demo")
    print("Enter image sizes like 1200x1600.")
    print("Press Enter at the size prompt to choose another quality.")
    print("Press Enter at the quality prompt, or enter an invalid quality choice, to exit.")

    while True:
        print()
        print("Quality choices:")
        for index, quality in enumerate(qualities, start=1):
            print(f"  {index}. {quality}")

        try:
            quality_choice = input("Choose quality 1, 2, or 3: ").strip()
        except EOFError:
            print()
            return

        if not quality_choice:
            return

        if quality_choice not in {"1", "2", "3"}:
            return

        quality = qualities[int(quality_choice) - 1]

        print()
        print(f"Using quality: {quality}")

        while True:
            try:
                size_text = input("Size: ").strip()
            except EOFError:
                print()
                return

            if not size_text:
                break

            try:
                clean_size_text = size_text.lower().replace(" ", "")

                if "x" in clean_size_text:
                    parts = clean_size_text.split("x")
                    if len(parts) != 2:
                        raise ValueError
                    width = int(parts[0])
                    height = int(parts[1])
                else:
                    width = int(clean_size_text)
                    height_text = input("Height: ").strip()
                    if not height_text:
                        break
                    height = int(height_text)

            except ValueError:
                print("Enter a size like 1200x1600, or enter a whole-number width.")
                continue

            original_width = width
            original_height = height

            errors = validate_image_size(width, height, model)

            if errors:
                print()
                print(f"{original_width}x{original_height} is not a valid request size:")
                for error in errors:
                    print(f"  - {error}")

                width, height = normalize(width, height, model)
                print(f"Normalized size: {original_width}x{original_height} -> {width}x{height}")
            else:
                print()
                print(f"{width}x{height} is valid.")
                print("No normalization needed.")

            tokens = calculate_image_tokens(quality, width, height, model)
            cheaper_size = recommend_cheaper_larger_size(model, (width, height), quality)

            print(f"Output tokens: {tokens:,}")
            print("Image output cost: "
                  f"${image_tokens_to_dollars(tokens, model):.6f}")
            if cheaper_size:
                cheaper_tokens = calculate_image_tokens(
                    quality,
                    *map(int, cheaper_size.split("x")),
                    model,
                )
                print(
                    f"Cheaper larger size: {cheaper_size} "
                    f"({cheaper_tokens:,} output tokens)"
                )
            else:
                print("Cheaper larger size: none found")

            print()

def demo():
    model = "gpt-image-2"
    qualities = ["low", "medium", "high"]
    print("GPT image size helpers demo")
    print("Enter sizes like 1200x1600. Blank input exits.")

    while True:
        text = input("\nSize: ").strip().lower().replace(" ", "")
        if not text:
            return

        try:
            if "x" in text:
                width, height = map(int, text.split("x"))
            else:
                width = int(text)
                height = int(input("Height: ").strip())
        except ValueError:
            print("Enter a size like 1200x1600, or a whole-number width.")
            continue

        errors = validate_image_size(width, height, model)
        if errors:
            old_size = f"{width}x{height}"
            print(f"{old_size} is invalid for {model}:")
            for error in errors:
                print(f"- {error}")
            print("We can fix that in code, though!")
            width, height = normalize(width, height, model)
            print(f"Normalized: {old_size} -> {width}x{height}")
        else:
            print(f"{width}x{height} is valid.")

        print(f"-- costs for {width}x{height} --")
        for quality in qualities:
            tokens = calculate_image_tokens(quality, width, height, model)
            cost = image_tokens_to_dollars(tokens, model)
            cheaper_size = recommend_cheaper_larger_size(model, (width, height), quality)

            print(f"{quality}: {tokens:,} tokens, ${cost:.6f}")

            if cheaper_size:
                cheaper_width, cheaper_height = map(int, cheaper_size.split("x"))
                cheaper_tokens = calculate_image_tokens(
                    quality,
                    cheaper_width,
                    cheaper_height,
                    model,
                )
                cheaper_cost = image_tokens_to_dollars(cheaper_tokens, model)
                print(
                    f"  cheaper larger is {cheaper_size}: "
                    f"{cheaper_tokens:,} tokens, ${cheaper_cost:.6f}"
                )
            else:
                print("  cheaper larger: none")

if __name__ == "__main__":
    demo()

Python with a verbose breakdown of what’s being computed, reverse-engineered. Input validation and errors messages included.

from typing import Final, Literal

## Validation constants

# Size validation happens before token calculation. The accepted size space is a
# 16 px lattice, so a dimension that is only 1 px away from a valid value can
# still have no token price.
SIZE_GRANULARITY_PX: Final[int] = 16

# The pixel budget is inclusive at both ends. These limits are checked against
# width * height, independent of the aspect-ratio band used later.
MIN_PIXEL_BUDGET: Final[int] = 655_360
MAX_PIXEL_BUDGET: Final[int] = 8_294_400

# The maximum edge rule is separate from total pixel budget. An image can be
# under the pixel budget and still be invalid if either side is too long.
MAX_EDGE_LENGTH_PX: Final[int] = 3_840

# Aspect ratio is checked as long_edge / short_edge <= 3. The exact 3:1 case is
# valid; only ratios greater than 3:1 are rejected.
MAX_ASPECT_RATIO: Final[int] = 3


def validate_image_size(width: int, height: int) -> list[str]:
    """
    Return validation messages for dimensions that cannot receive a token price.

    An empty list means the dimensions are eligible for token calculation.
    """
    if type(width) is not int or type(height) is not int or width <= 0 or height <= 0:
        return ["Enter whole-number width and height values greater than 0."]

    errors: list[str] = []

    if width % SIZE_GRANULARITY_PX != 0 or height % SIZE_GRANULARITY_PX != 0:
        errors.append("Width and height must both be divisible by 16.")

    pixel_budget = width * height

    if pixel_budget > MAX_PIXEL_BUDGET:
        errors.append(
            f"Pixel budget must be no greater than "
            f"{MAX_PIXEL_BUDGET:,} pixels, inclusive."
        )

    if pixel_budget < MIN_PIXEL_BUDGET:
        errors.append(
            f"Pixel budget must be at least "
            f"{MIN_PIXEL_BUDGET:,} pixels, inclusive."
        )

    long_edge = max(width, height)
    short_edge = min(width, height)

    if long_edge > MAX_EDGE_LENGTH_PX:
        errors.append(
            f"Maximum edge length must be less than or equal to "
            f"{MAX_EDGE_LENGTH_PX:,}px."
        )

    if long_edge > MAX_ASPECT_RATIO * short_edge:
        errors.append("Aspect ratio must be no greater than 3:1.")

    return errors

## Algorithm constants

Quality = Literal["low", "medium", "high"]

# The quality setting enters the calculation as an integer axis factor.
# For square images, the pre-area token grid is this value squared:
# low=16*16, medium=48*48, high=96*96.
QUALITY_AXIS_FACTORS: Final[dict[Quality, int]] = {
    "low": 16,
    "medium": 48,
    "high": 96,
}

# The final area multiplier is:
#
#     (AREA_OFFSET_PIXELS + width * height) / AREA_SCALE_DENOMINATOR
#
# The positive offset means the token count is not proportional to image area
# alone. At the minimum valid pixel budget, the offset is larger than the image
# itself, so smaller valid images still carry a substantial fixed area term.
AREA_OFFSET_PIXELS: Final[int] = 2_000_000
AREA_SCALE_DENOMINATOR: Final[int] = 4_000_000


def _round_half_up_ratio(numerator: int, denominator: int) -> int:
    """
    Round numerator / denominator to the nearest integer, with exact halves up.

    The aspect component falls into integer bands rather than staying continuous.
    Using integer arithmetic keeps boundary cases deterministic and avoids
    moving a half-step threshold because of binary floating-point representation.
    """
    return (2 * numerator + denominator) // (2 * denominator)


def _ceil_div(numerator: int, denominator: int) -> int:
    """
    Return ceil(numerator / denominator) for positive integers.

    Token totals are whole numbers. Any non-zero fractional remainder in the
    scaled calculation increases the reported total to the next integer token.
    """
    return (numerator + denominator - 1) // denominator


def calculate_image_tokens(quality: Quality, width: int, height: int) -> int:
    """
    Return the output token count for an image setting.

    The calculation is symmetric in width and height. Rotating an image does not
    change the result because only the long edge, short edge, and total area are
    used.

    Raises:
        ValueError: If quality is not one of "low", "medium", or "high", or if
            the dimensions violate the size rules.
    """
    if quality not in QUALITY_AXIS_FACTORS:
        allowed = ", ".join(repr(value) for value in QUALITY_AXIS_FACTORS)
        raise ValueError(f"quality must be one of {allowed}; got {quality!r}")

    errors = validate_image_size(width, height)
    if errors:
        raise ValueError("Invalid image size:\n- " + "\n- ".join(errors))

    long_edge = max(width, height)
    short_edge = min(width, height)

    quality_axis_factor = QUALITY_AXIS_FACTORS[quality]

    # The longer side keeps the full quality axis factor. The shorter side is
    # reduced according to short_edge / long_edge and then rounded into an
    # integer band.
    #
    # This rounded band is the source of the visible downward jumps in resolution
    # tables: as one edge grows, area rises gradually, but the aspect band can
    # drop by 1 at a threshold. That one-band drop can outweigh the added pixels.
    short_axis_factor = _round_half_up_ratio(
        quality_axis_factor * short_edge,
        long_edge,
    )

    # The calculation behaves like a rectangular token grid:
    #
    #     full quality axis factor * aspect-adjusted short-axis factor
    #
    # For square images, short_axis_factor equals quality_axis_factor, so the
    # grid is exactly quality_axis_factor squared.
    token_grid = quality_axis_factor * short_axis_factor

    pixel_budget = width * height

    # Area then scales the token grid with a fixed positive offset. Keeping this
    # as integer arithmetic gives an exact final ceiling instead of depending on
    # floating-point rounding near token boundaries.
    scaled_token_numerator = token_grid * (AREA_OFFSET_PIXELS + pixel_budget)

    return _ceil_div(scaled_token_numerator, AREA_SCALE_DENOMINATOR)


def main() -> None:
    quality: Quality = "medium"
    width = 1200
    height = 1472
    try:
        tokens = calculate_image_tokens(quality, width, height)
        print(f"quality={quality}, width={width}, height={height} -> {tokens} tokens")
    except ValueError as v:
        print(f"-- ValueError --\n{v}")


if __name__ == "__main__":
    main()

Documentation

If OpenAI were to explain “how to calculate costs” in natural language, here’s what it might look like.

Summary

Get out paper and a calculator…

How to calculate gpt-image-2 image output token cost

For gpt-image-2, the output cost for an image is determined from:

  1. the requested quality: low, medium, or high
  2. the image width and height in pixels
  3. a model-specific aspect-ratio adjustment
  4. a model-specific area multiplier
  5. the model price of $30 per 1,000,000 output tokens

The final result is a whole-number output token count. Dollar cost is then calculated from that token count.


1. First, check whether the size is priceable

Before calculating tokens, the image dimensions must pass all size rules.

Let:

pixel_budget = width * height
long_edge = max(width, height)
short_edge = min(width, height)

The image is valid only if all of the following are true:

  • width and height are positive whole numbers.
  • Both dimensions are divisible by 16.
  • The total pixel count is at least 655,360 pixels.
  • The total pixel count is no greater than 8,294,400 pixels.
  • Neither side is longer than 3,840 pixels.
  • The aspect ratio is no greater than 3:1.

That last rule means:

long_edge / short_edge <= 3

or equivalently:

long_edge <= 3 * short_edge

An exact 3:1 image is valid. Anything wider or taller than 3:1 is not.

If the image fails these checks, it does not receive a token price from this formula.


2. Choose the quality axis factor

Each quality level maps to a model-specific integer factor:

low     -> 16
medium  -> 48
high    -> 96

Call this value:

quality_axis_factor

For example, with medium quality:

quality_axis_factor = 48

This factor acts like the token-grid size for the long side of the image.

For square images, the pre-area token grid is:

quality_axis_factor * quality_axis_factor

So before the area multiplier:

low square grid     = 16 * 16 = 256
medium square grid  = 48 * 48 = 2,304
high square grid    = 96 * 96 = 9,216

3. Normalize the image orientation

The calculation is symmetric in width and height.

A 1200 x 1472 image and a 1472 x 1200 image produce the same token count.

Use:

long_edge = max(width, height)
short_edge = min(width, height)

The long edge keeps the full quality axis factor.

The short edge receives an adjusted factor based on the image’s aspect ratio.


4. Calculate the aspect-adjusted short-axis factor

The short side does not keep the full quality factor unless the image is square.

Instead, calculate:

raw_short_axis = quality_axis_factor * short_edge / long_edge

Then round that value to the nearest whole number.

Exact halves round upward.

So:

short_axis_factor = round_half_up(
    quality_axis_factor * short_edge / long_edge
)

For example, with medium quality and a 1472 x 1200 image:

quality_axis_factor = 48
long_edge = 1472
short_edge = 1200

raw_short_axis = 48 * 1200 / 1472
               = 39.130...

short_axis_factor = 39

This short_axis_factor is an integer aspect-ratio band.

That integer band is very important: it is the reason prices can move in steps instead of changing smoothly.


Aside: why a larger image can sometimes cost fewer tokens

The formula has two competing parts:

  1. Area increases gradually as width or height increases.
  2. The aspect-ratio band decreases in whole-number steps as the image gets farther from square.

The short-axis factor is rounded to an integer:

short_axis_factor = round_half_up(
    quality_axis_factor * short_edge / long_edge
)

As the aspect ratio becomes wider or taller, this value goes down.

But it does not go down smoothly. It stays fixed for a while, then drops by 1.

Within a single band, the short_axis_factor is fixed. If one dimension increases, the pixel area increases, so the token count usually rises slightly.

But when the image crosses into the next aspect-ratio band, the short_axis_factor drops by 1. That reduces the token grid all at once.

That drop can be larger than the small increase caused by the extra pixels.

So the token count can look like this:

same band:       area increases -> token count creeps upward
band boundary:   short-axis factor drops -> token count jumps downward
same band:       area increases -> token count creeps upward again

This creates a sawtooth pattern.

For example, with height pinned at 1200 and quality medium:

1200 x 1200:
    short_axis_factor = 48
    tokens = 1982

1216 x 1200:
    short_axis_factor = 47
    tokens = 1951

1232 x 1200:
    short_axis_factor = 47
    tokens = 1962

1248 x 1200:
    short_axis_factor = 46
    tokens = 1931

The 1216 x 1200 image has more pixels than 1200 x 1200, but it costs fewer tokens because it crossed from aspect band 48 down to aspect band 47.

Then 1232 x 1200 is still in band 47, so the larger area raises the token count slightly.

Then 1248 x 1200 crosses into band 46, so the token count drops again.

For a fixed 1200 height, this also explains the larger pattern:

  • As width increases toward 1200, the image becomes more square, so the aspect band tends to rise and the token count rises.
  • After width passes 1200, the image becomes wider, so the aspect band tends to fall and the token count falls in steps.
  • Within each band, area still causes small upward movement.

5. Calculate the token grid

After calculating the short-axis factor, multiply it by the quality axis factor:

token_grid = quality_axis_factor * short_axis_factor

For the medium, 1472 x 1200 example:

quality_axis_factor = 48
short_axis_factor = 39

token_grid = 48 * 39
           = 1872

This is not the final token count yet.

It is the aspect-adjusted grid before applying the area multiplier.


6. Calculate the area multiplier

The image area is used with a fixed positive offset.

First calculate:

pixel_budget = width * height

Then calculate the model-specific area term:

area_multiplier = (2,000,000 + pixel_budget) / 4,000,000

The 2,000,000 pixel offset means the token count is not proportional to image area alone.

Smaller valid images still carry a substantial fixed area term.

For example, with a 1472 x 1200 image:

pixel_budget = 1472 * 1200
             = 1,766,400

area_multiplier = (2,000,000 + 1,766,400) / 4,000,000
                = 3,766,400 / 4,000,000
                = 0.9416

7. Apply the area multiplier and round up

The final output token count is:

tokens = ceil(token_grid * area_multiplier)

Equivalently:

tokens = ceil(
    quality_axis_factor
    * short_axis_factor
    * (2,000,000 + width * height)
    / 4,000,000
)

The final rounding is always upward to the next whole token.

For the medium, 1472 x 1200 example:

token_grid = 1872
area_multiplier = 0.9416

raw_tokens = 1872 * 0.9416
           = 1762.6752

tokens = ceil(1762.6752)
       = 1763

So:

medium, 1472 x 1200 -> 1763 output tokens

8. Convert output tokens to dollars

The model price is:

$30 per 1,000,000 output tokens

So the dollar cost is:

dollar_cost = tokens * 30 / 1,000,000

or:

dollar_cost = tokens * 0.00003

For the 1763 token example:

dollar_cost = 1763 * 30 / 1,000,000
            = 0.05289

So:

medium, 1472 x 1200 -> 1763 output tokens -> $0.05289

Compact formula

For a valid image:

q = quality axis factor

low     -> q = 16
medium  -> q = 48
high    -> q = 96

long_edge = max(width, height)
short_edge = min(width, height)

short_axis_factor = round_half_up(q * short_edge / long_edge)

tokens = ceil(
    q
    * short_axis_factor
    * (2,000,000 + width * height)
    / 4,000,000
)

dollars = tokens * 30 / 1,000,000

The important non-obvious behavior is that short_axis_factor is an integer band. As aspect ratio moves away from square, that band drops in steps. Those drops can outweigh the gradual increase from extra image area, which is why a larger non-square image can sometimes cost fewer output tokens than a smaller, more square one.


Result

Look forward to your favorite image creation applications telling you not just your input cost in language tokens (also the “vision” image input component), but also your output and final cost - before you push a “send” button!

The loopy stuff - example of steps in costs

Particular is that as aspect ratio increases from square, even when one resolution is pinned and the other increases, for more area, the price actually reduces in step bands.

Example step bands, where each new line is a jump, and when the aspect ratio is increasing, it is a jump to a lower price after the small linear internal increases.

If we pin height to 1200px, and then go through ranges of widths, what kind of token ranges do we get?

(NNNN x 1200), quality:low, resolution cost range: token cost range through that width range
560- 560: 75
576- 624: 87- 88 # zone of pick the last number
640- 704: 100-103
720- 784: 115-118
800- 848: 131-133
864- 928: 146-150
944-1008: 163-167
1024-1072: 181-185
1088-1152: 199-203
1168-1232: 218-223
1248-1312: 210-215 # zone of pick the first number
1328-1408: 202-207
1424-1536: 193-200
1552-1664: 186-192
1680-1824: 177-185
1840-2016: 169-177
2032-2256: 160-170
2272-2560: 152-163
2576-2944: 143-155
2960-3488: 134-149
3504-3600: 125-127

Each row is like a “band”, where the next row realizes a jump down in token usage.

(NNNN x 1200), quality:medium, resolution cost range: that width, then followed by additional 16px steps in that band of costs:

for image height 1200, the width: cost of +0px, +16px, +32px…
1200: 1982
1216: 1951, 1962,
1248: 1931, 1942,
1280: 1910,
1296: 1878, 1888,
1328: 1855, 1865,
1360: 1831, 1841,
1392: 1806, 1816,
1424: 1781, 1790, 1799
1472: 1763, 1772,
1504: 1735, 1744, 1753
1552: 1715, 1724,
1584: 1686, 1694, 1702
1632: 1663, 1671, 1679
1680: 1639, 1647, 1655
1728: 1614, 1621, 1629
1776: 1587, 1594, 1602, 1609
1840: 1566, 1573, 1580, 1587
1904: 1543, 1550, 1557, 1564
1968: 1518, 1525, 1532, 1538
2032: 1492, 1498, 1505, 1511
2096: 1463, 1470, 1476, 1482, 1488
2176: 1439, 1445, 1451, 1457, 1463, 1469
2272: 1418, 1424, 1430, 1436, 1441
2352: 1389, 1395, 1400, 1406, 1411, 1417, 1423
2464: 1369, 1374, 1379, 1384, 1390, 1395, 1400
2576: 1345, 1350, 1355, 1360, 1365, 1370, 1375
2688: 1317, 1322, 1327, 1332, 1337, 1342, 1346, 1351
2816: 1292, 1296, 1301, 1305, 1310, 1315, 1319, 1324, 1328
2960: 1266, 1271, 1275, 1279, 1284, 1288, 1293, 1297, 1301, 1306
3120: 1241, 1245, 1249, 1254, 1258, 1262, 1266, 1270, 1274, 1279, 1283
3296: 1215, 1219, 1223, 1227, 1231, 1235, 1239, 1243, 1247, 1251, 1255, 1258, 1262
3504: 1192, 1196, 1199, 1203, 1207, 1210, 1214