OpenAI made this opaque. I’m pleased to present:
Interactive image output token and cost calculator, much easier than OpenAI’s calculator (if you can find it, that likes to say “invalid” after your input).
That’s based the algorithm behind cost calculations for gpt-image-2. If you like JavaScript, steal away there.
Simple code
Bringing this forum around to “stuff for developers” instead of "complaints from developers, here’s the money shot: Python code for you to compute token output from quality and size as inputs (non-validating)
IMAGE_MODEL_SPECS = {
"gpt-image-2": {
"size_limits": {
"step_px": 16,
"min_pixels": 655_360,
"max_pixels": 8_294_400,
"max_dimension_px": 3_840,
"max_aspect_ratio": 3.0,
},
"quality_axis_factors": {"low": 16, "medium": 48, "high": 96},
"token_area_offset_pixels": 2_000_000,
"token_area_scale_denominator": 4_000_000,
"image_output_price_per_million_tokens": 30.00,
},
}
def calculate_image_tokens(quality, width, height, model="gpt-image-2"):
spec = IMAGE_MODEL_SPECS[model]
quality_axis_factor = spec["quality_axis_factors"].get(quality)
if quality_axis_factor is None:
allowed = "', '".join(spec["quality_axis_factors"])
raise ValueError(f"quality must be one of '{allowed}'; got {quality!r}")
long_edge = max(width, height)
short_edge = min(width, height)
short_axis_factor = (
2 * quality_axis_factor * short_edge + long_edge
) // (2 * long_edge)
return (
quality_axis_factor
* short_axis_factor
* (spec["token_area_offset_pixels"] + width * height)
+ spec["token_area_scale_denominator"]
- 1
) // spec["token_area_scale_denominator"]
Code with utilities included
- Validate image size: return error message about the input limit “rules” violated, or use no messages as “success”
- Normalize: re-shape a desired resolution into one that works
- Recommend cheaper: in step bands, there are cheaper resolutions with a more rectangular aspect ratio to be found - we find them and recommend the size string!
- Tokens to dollars: because math is hard; this consumes from a model’s truth (and possible future models)
Includes a demo so you can exercise these all:
GPT image size helpers demo
Enter sizes like 1200x1600. Blank input exits.
Size: *333x355*
333x355 is invalid for gpt-image-2:
- Width and height must both be divisible by 16.
- Pixel budget must be at least 655,360 pixels, inclusive.
We can fix that in code, though!
Normalized: 333x355 -> 784x848
-- costs for 784x848 --
low: 160 tokens, $0.004800
cheaper larger is 784x880: 151 tokens, $0.004530
medium: 1,408 tokens, $0.042240
cheaper larger is 784x880: 1,388 tokens, $0.041640
high: 5,693 tokens, $0.170790
cheaper larger is 784x864: 5,591 tokens, $0.167730
Python helper utilities
from math import ceil, floor
IMAGE_MODEL_SPECS = {
"gpt-image-2": {
"size_limits": {
"step_px": 16,
"min_pixels": 655_360,
"max_pixels": 8_294_400,
"max_dimension_px": 3_840,
"max_aspect_ratio": 3.0,
},
"quality_axis_factors": {"low": 16, "medium": 48, "high": 96},
"token_area_offset_pixels": 2_000_000,
"token_area_scale_denominator": 4_000_000,
"image_output_price_per_million_tokens": 30.00,
},
}
def calculate_image_tokens(quality, width, height, model="gpt-image-2"):
spec = IMAGE_MODEL_SPECS[model]
quality_axis_factor = spec["quality_axis_factors"].get(quality)
if quality_axis_factor is None:
allowed = "', '".join(spec["quality_axis_factors"])
raise ValueError(f"quality must be one of '{allowed}'; got {quality!r}")
long_edge = max(width, height)
short_edge = min(width, height)
short_axis_factor = (
2 * quality_axis_factor * short_edge + long_edge
) // (2 * long_edge)
return (
quality_axis_factor
* short_axis_factor
* (spec["token_area_offset_pixels"] + width * height)
+ spec["token_area_scale_denominator"]
- 1
) // spec["token_area_scale_denominator"]
def validate_image_size(width, height, model="gpt-image-2"):
limits = IMAGE_MODEL_SPECS[model]["size_limits"]
step = limits["step_px"]
min_pixels = limits["min_pixels"]
max_pixels = limits["max_pixels"]
max_dimension = limits["max_dimension_px"]
max_ratio = limits["max_aspect_ratio"]
if type(width) is not int or type(height) is not int or width <= 0 or height <= 0:
return ["Enter whole-number width and height values greater than 0."]
pixels = width * height
long_edge = max(width, height)
short_edge = min(width, height)
errors = []
if width % step != 0 or height % step != 0:
errors.append(f"Width and height must both be divisible by {step}.")
if pixels > max_pixels:
errors.append(
f"Pixel budget must be no greater than {max_pixels:,} pixels, inclusive."
)
if pixels < min_pixels:
errors.append(
f"Pixel budget must be at least {min_pixels:,} pixels, inclusive."
)
if long_edge > max_dimension:
errors.append(
f"Maximum edge length must be less than or equal to {max_dimension:,}px."
)
if long_edge > max_ratio * short_edge:
errors.append(f"Aspect ratio must be no greater than {max_ratio:g}:1.")
return errors
def normalize(width, height, model="gpt-image-2"):
limits = IMAGE_MODEL_SPECS[model]["size_limits"]
step = limits["step_px"]
min_area = ceil(limits["min_pixels"] / (step * step))
max_area = floor(limits["max_pixels"] / (step * step))
max_side = floor(limits["max_dimension_px"] / step)
max_ratio = float(limits["max_aspect_ratio"])
width = max(1, int(width))
height = max(1, int(height))
ratio = max(1.0 / max_ratio, min(max_ratio, width / height))
if ratio >= 1.0:
max_area = min(max_area, max_side * max(1, floor(max_side / ratio)))
else:
max_area = min(max_area, max_side * max(1, floor(max_side * ratio)))
pixels = width * height
if pixels < min_area * step * step:
area = min_area
elif pixels > max_area * step * step:
area = max_area
else:
area = pixels / (step * step)
target_w = (area * ratio) ** 0.5
target_h = (area / ratio) ** 0.5
choices = []
for h in {floor(target_h) - 1, floor(target_h), ceil(target_h), ceil(target_h) + 1}:
if 1 <= h <= max_side:
lo = max(1, ceil(min_area / h), ceil(h / max_ratio))
hi = min(max_side, floor(max_area / h), floor(h * max_ratio))
if lo <= hi:
w = min(hi, max(lo, round(ratio * h)))
choices.append((w, h))
for w in {floor(target_w) - 1, floor(target_w), ceil(target_w), ceil(target_w) + 1}:
if 1 <= w <= max_side:
lo = max(1, ceil(min_area / w), ceil(w / max_ratio))
hi = min(max_side, floor(max_area / w), floor(w * max_ratio))
if lo <= hi:
h = min(hi, max(lo, round(w / ratio)))
choices.append((w, h))
best = min(
choices,
key=lambda size: (
((size[0] - target_w) / target_w) ** 2
+ ((size[1] - target_h) / target_h) ** 2,
abs(size[0] * size[1] - area),
),
)
return best[0] * step, best[1] * step
def recommend_cheaper_larger_size(model, size, quality):
if isinstance(size, str):
width, height = map(int, size.lower().split()[0].split("x"))
else:
width, height = size
if validate_image_size(width, height, model):
return None
spec = IMAGE_MODEL_SPECS[model]
q = spec["quality_axis_factors"].get(quality)
if q is None:
allowed = "', '".join(spec["quality_axis_factors"])
raise ValueError(f"quality must be one of '{allowed}'; got {quality!r}")
limits = spec["size_limits"]
step = limits["step_px"]
max_dimension = (limits["max_dimension_px"] // step) * step
long_side = max(width, height)
short_side = min(width, height)
grow_width = width >= height
tokens = calculate_image_tokens(quality, width, height, model)
if long_side > short_side:
prev_long = long_side - step
prev_size = (prev_long, short_side) if grow_width else (short_side, prev_long)
if not validate_image_size(prev_size[0], prev_size[1], model):
if calculate_image_tokens(quality, prev_size[0], prev_size[1], model) > tokens:
return None
max_long = min(
max_dimension,
(int(limits["max_aspect_ratio"] * short_side) // step) * step,
(limits["max_pixels"] // short_side // step) * step,
)
band = (2 * q * short_side + long_side) // (2 * long_side)
for next_band in range(band - 1, 0, -1):
threshold = (2 * q * short_side) // (2 * next_band + 1) + 1
candidate_long = max(long_side + step, ((threshold + step - 1) // step) * step)
if candidate_long > max_long:
return None
candidate_size = (
(candidate_long, short_side)
if grow_width
else (short_side, candidate_long)
)
if calculate_image_tokens(
quality,
candidate_size[0],
candidate_size[1],
model,
) < tokens:
return f"{candidate_size[0]}x{candidate_size[1]}"
return None
def image_tokens_to_dollars(tokens, model="gpt-image-2"):
price_per_million = IMAGE_MODEL_SPECS[model]["image_output_price_per_million_tokens"]
return tokens * price_per_million / 1_000_000
def demo():
model = "gpt-image-2"
qualities = ["low", "medium", "high"]
print("GPT image size helper demo")
print("Enter image sizes like 1200x1600.")
print("Press Enter at the size prompt to choose another quality.")
print("Press Enter at the quality prompt, or enter an invalid quality choice, to exit.")
while True:
print()
print("Quality choices:")
for index, quality in enumerate(qualities, start=1):
print(f" {index}. {quality}")
try:
quality_choice = input("Choose quality 1, 2, or 3: ").strip()
except EOFError:
print()
return
if not quality_choice:
return
if quality_choice not in {"1", "2", "3"}:
return
quality = qualities[int(quality_choice) - 1]
print()
print(f"Using quality: {quality}")
while True:
try:
size_text = input("Size: ").strip()
except EOFError:
print()
return
if not size_text:
break
try:
clean_size_text = size_text.lower().replace(" ", "")
if "x" in clean_size_text:
parts = clean_size_text.split("x")
if len(parts) != 2:
raise ValueError
width = int(parts[0])
height = int(parts[1])
else:
width = int(clean_size_text)
height_text = input("Height: ").strip()
if not height_text:
break
height = int(height_text)
except ValueError:
print("Enter a size like 1200x1600, or enter a whole-number width.")
continue
original_width = width
original_height = height
errors = validate_image_size(width, height, model)
if errors:
print()
print(f"{original_width}x{original_height} is not a valid request size:")
for error in errors:
print(f" - {error}")
width, height = normalize(width, height, model)
print(f"Normalized size: {original_width}x{original_height} -> {width}x{height}")
else:
print()
print(f"{width}x{height} is valid.")
print("No normalization needed.")
tokens = calculate_image_tokens(quality, width, height, model)
cheaper_size = recommend_cheaper_larger_size(model, (width, height), quality)
print(f"Output tokens: {tokens:,}")
print("Image output cost: "
f"${image_tokens_to_dollars(tokens, model):.6f}")
if cheaper_size:
cheaper_tokens = calculate_image_tokens(
quality,
*map(int, cheaper_size.split("x")),
model,
)
print(
f"Cheaper larger size: {cheaper_size} "
f"({cheaper_tokens:,} output tokens)"
)
else:
print("Cheaper larger size: none found")
print()
def demo():
model = "gpt-image-2"
qualities = ["low", "medium", "high"]
print("GPT image size helpers demo")
print("Enter sizes like 1200x1600. Blank input exits.")
while True:
text = input("\nSize: ").strip().lower().replace(" ", "")
if not text:
return
try:
if "x" in text:
width, height = map(int, text.split("x"))
else:
width = int(text)
height = int(input("Height: ").strip())
except ValueError:
print("Enter a size like 1200x1600, or a whole-number width.")
continue
errors = validate_image_size(width, height, model)
if errors:
old_size = f"{width}x{height}"
print(f"{old_size} is invalid for {model}:")
for error in errors:
print(f"- {error}")
print("We can fix that in code, though!")
width, height = normalize(width, height, model)
print(f"Normalized: {old_size} -> {width}x{height}")
else:
print(f"{width}x{height} is valid.")
print(f"-- costs for {width}x{height} --")
for quality in qualities:
tokens = calculate_image_tokens(quality, width, height, model)
cost = image_tokens_to_dollars(tokens, model)
cheaper_size = recommend_cheaper_larger_size(model, (width, height), quality)
print(f"{quality}: {tokens:,} tokens, ${cost:.6f}")
if cheaper_size:
cheaper_width, cheaper_height = map(int, cheaper_size.split("x"))
cheaper_tokens = calculate_image_tokens(
quality,
cheaper_width,
cheaper_height,
model,
)
cheaper_cost = image_tokens_to_dollars(cheaper_tokens, model)
print(
f" cheaper larger is {cheaper_size}: "
f"{cheaper_tokens:,} tokens, ${cheaper_cost:.6f}"
)
else:
print(" cheaper larger: none")
if __name__ == "__main__":
demo()
Python with a verbose breakdown of what’s being computed, reverse-engineered. Input validation and errors messages included.
from typing import Final, Literal
## Validation constants
# Size validation happens before token calculation. The accepted size space is a
# 16 px lattice, so a dimension that is only 1 px away from a valid value can
# still have no token price.
SIZE_GRANULARITY_PX: Final[int] = 16
# The pixel budget is inclusive at both ends. These limits are checked against
# width * height, independent of the aspect-ratio band used later.
MIN_PIXEL_BUDGET: Final[int] = 655_360
MAX_PIXEL_BUDGET: Final[int] = 8_294_400
# The maximum edge rule is separate from total pixel budget. An image can be
# under the pixel budget and still be invalid if either side is too long.
MAX_EDGE_LENGTH_PX: Final[int] = 3_840
# Aspect ratio is checked as long_edge / short_edge <= 3. The exact 3:1 case is
# valid; only ratios greater than 3:1 are rejected.
MAX_ASPECT_RATIO: Final[int] = 3
def validate_image_size(width: int, height: int) -> list[str]:
"""
Return validation messages for dimensions that cannot receive a token price.
An empty list means the dimensions are eligible for token calculation.
"""
if type(width) is not int or type(height) is not int or width <= 0 or height <= 0:
return ["Enter whole-number width and height values greater than 0."]
errors: list[str] = []
if width % SIZE_GRANULARITY_PX != 0 or height % SIZE_GRANULARITY_PX != 0:
errors.append("Width and height must both be divisible by 16.")
pixel_budget = width * height
if pixel_budget > MAX_PIXEL_BUDGET:
errors.append(
f"Pixel budget must be no greater than "
f"{MAX_PIXEL_BUDGET:,} pixels, inclusive."
)
if pixel_budget < MIN_PIXEL_BUDGET:
errors.append(
f"Pixel budget must be at least "
f"{MIN_PIXEL_BUDGET:,} pixels, inclusive."
)
long_edge = max(width, height)
short_edge = min(width, height)
if long_edge > MAX_EDGE_LENGTH_PX:
errors.append(
f"Maximum edge length must be less than or equal to "
f"{MAX_EDGE_LENGTH_PX:,}px."
)
if long_edge > MAX_ASPECT_RATIO * short_edge:
errors.append("Aspect ratio must be no greater than 3:1.")
return errors
## Algorithm constants
Quality = Literal["low", "medium", "high"]
# The quality setting enters the calculation as an integer axis factor.
# For square images, the pre-area token grid is this value squared:
# low=16*16, medium=48*48, high=96*96.
QUALITY_AXIS_FACTORS: Final[dict[Quality, int]] = {
"low": 16,
"medium": 48,
"high": 96,
}
# The final area multiplier is:
#
# (AREA_OFFSET_PIXELS + width * height) / AREA_SCALE_DENOMINATOR
#
# The positive offset means the token count is not proportional to image area
# alone. At the minimum valid pixel budget, the offset is larger than the image
# itself, so smaller valid images still carry a substantial fixed area term.
AREA_OFFSET_PIXELS: Final[int] = 2_000_000
AREA_SCALE_DENOMINATOR: Final[int] = 4_000_000
def _round_half_up_ratio(numerator: int, denominator: int) -> int:
"""
Round numerator / denominator to the nearest integer, with exact halves up.
The aspect component falls into integer bands rather than staying continuous.
Using integer arithmetic keeps boundary cases deterministic and avoids
moving a half-step threshold because of binary floating-point representation.
"""
return (2 * numerator + denominator) // (2 * denominator)
def _ceil_div(numerator: int, denominator: int) -> int:
"""
Return ceil(numerator / denominator) for positive integers.
Token totals are whole numbers. Any non-zero fractional remainder in the
scaled calculation increases the reported total to the next integer token.
"""
return (numerator + denominator - 1) // denominator
def calculate_image_tokens(quality: Quality, width: int, height: int) -> int:
"""
Return the output token count for an image setting.
The calculation is symmetric in width and height. Rotating an image does not
change the result because only the long edge, short edge, and total area are
used.
Raises:
ValueError: If quality is not one of "low", "medium", or "high", or if
the dimensions violate the size rules.
"""
if quality not in QUALITY_AXIS_FACTORS:
allowed = ", ".join(repr(value) for value in QUALITY_AXIS_FACTORS)
raise ValueError(f"quality must be one of {allowed}; got {quality!r}")
errors = validate_image_size(width, height)
if errors:
raise ValueError("Invalid image size:\n- " + "\n- ".join(errors))
long_edge = max(width, height)
short_edge = min(width, height)
quality_axis_factor = QUALITY_AXIS_FACTORS[quality]
# The longer side keeps the full quality axis factor. The shorter side is
# reduced according to short_edge / long_edge and then rounded into an
# integer band.
#
# This rounded band is the source of the visible downward jumps in resolution
# tables: as one edge grows, area rises gradually, but the aspect band can
# drop by 1 at a threshold. That one-band drop can outweigh the added pixels.
short_axis_factor = _round_half_up_ratio(
quality_axis_factor * short_edge,
long_edge,
)
# The calculation behaves like a rectangular token grid:
#
# full quality axis factor * aspect-adjusted short-axis factor
#
# For square images, short_axis_factor equals quality_axis_factor, so the
# grid is exactly quality_axis_factor squared.
token_grid = quality_axis_factor * short_axis_factor
pixel_budget = width * height
# Area then scales the token grid with a fixed positive offset. Keeping this
# as integer arithmetic gives an exact final ceiling instead of depending on
# floating-point rounding near token boundaries.
scaled_token_numerator = token_grid * (AREA_OFFSET_PIXELS + pixel_budget)
return _ceil_div(scaled_token_numerator, AREA_SCALE_DENOMINATOR)
def main() -> None:
quality: Quality = "medium"
width = 1200
height = 1472
try:
tokens = calculate_image_tokens(quality, width, height)
print(f"quality={quality}, width={width}, height={height} -> {tokens} tokens")
except ValueError as v:
print(f"-- ValueError --\n{v}")
if __name__ == "__main__":
main()
Documentation
If OpenAI were to explain “how to calculate costs” in natural language, here’s what it might look like.
Summary
Get out paper and a calculator…
How to calculate gpt-image-2 image output token cost
For gpt-image-2, the output cost for an image is determined from:
- the requested quality:
low,medium, orhigh - the image width and height in pixels
- a model-specific aspect-ratio adjustment
- a model-specific area multiplier
- the model price of $30 per 1,000,000 output tokens
The final result is a whole-number output token count. Dollar cost is then calculated from that token count.
1. First, check whether the size is priceable
Before calculating tokens, the image dimensions must pass all size rules.
Let:
pixel_budget = width * height
long_edge = max(width, height)
short_edge = min(width, height)
The image is valid only if all of the following are true:
widthandheightare positive whole numbers.- Both dimensions are divisible by
16. - The total pixel count is at least
655,360pixels. - The total pixel count is no greater than
8,294,400pixels. - Neither side is longer than
3,840pixels. - The aspect ratio is no greater than
3:1.
That last rule means:
long_edge / short_edge <= 3
or equivalently:
long_edge <= 3 * short_edge
An exact 3:1 image is valid. Anything wider or taller than 3:1 is not.
If the image fails these checks, it does not receive a token price from this formula.
2. Choose the quality axis factor
Each quality level maps to a model-specific integer factor:
low -> 16
medium -> 48
high -> 96
Call this value:
quality_axis_factor
For example, with medium quality:
quality_axis_factor = 48
This factor acts like the token-grid size for the long side of the image.
For square images, the pre-area token grid is:
quality_axis_factor * quality_axis_factor
So before the area multiplier:
low square grid = 16 * 16 = 256
medium square grid = 48 * 48 = 2,304
high square grid = 96 * 96 = 9,216
3. Normalize the image orientation
The calculation is symmetric in width and height.
A 1200 x 1472 image and a 1472 x 1200 image produce the same token count.
Use:
long_edge = max(width, height)
short_edge = min(width, height)
The long edge keeps the full quality axis factor.
The short edge receives an adjusted factor based on the image’s aspect ratio.
4. Calculate the aspect-adjusted short-axis factor
The short side does not keep the full quality factor unless the image is square.
Instead, calculate:
raw_short_axis = quality_axis_factor * short_edge / long_edge
Then round that value to the nearest whole number.
Exact halves round upward.
So:
short_axis_factor = round_half_up(
quality_axis_factor * short_edge / long_edge
)
For example, with medium quality and a 1472 x 1200 image:
quality_axis_factor = 48
long_edge = 1472
short_edge = 1200
raw_short_axis = 48 * 1200 / 1472
= 39.130...
short_axis_factor = 39
This short_axis_factor is an integer aspect-ratio band.
That integer band is very important: it is the reason prices can move in steps instead of changing smoothly.
Aside: why a larger image can sometimes cost fewer tokens
The formula has two competing parts:
- Area increases gradually as width or height increases.
- The aspect-ratio band decreases in whole-number steps as the image gets farther from square.
The short-axis factor is rounded to an integer:
short_axis_factor = round_half_up(
quality_axis_factor * short_edge / long_edge
)
As the aspect ratio becomes wider or taller, this value goes down.
But it does not go down smoothly. It stays fixed for a while, then drops by 1.
Within a single band, the short_axis_factor is fixed. If one dimension increases, the pixel area increases, so the token count usually rises slightly.
But when the image crosses into the next aspect-ratio band, the short_axis_factor drops by 1. That reduces the token grid all at once.
That drop can be larger than the small increase caused by the extra pixels.
So the token count can look like this:
same band: area increases -> token count creeps upward
band boundary: short-axis factor drops -> token count jumps downward
same band: area increases -> token count creeps upward again
This creates a sawtooth pattern.
For example, with height pinned at 1200 and quality medium:
1200 x 1200:
short_axis_factor = 48
tokens = 1982
1216 x 1200:
short_axis_factor = 47
tokens = 1951
1232 x 1200:
short_axis_factor = 47
tokens = 1962
1248 x 1200:
short_axis_factor = 46
tokens = 1931
The 1216 x 1200 image has more pixels than 1200 x 1200, but it costs fewer tokens because it crossed from aspect band 48 down to aspect band 47.
Then 1232 x 1200 is still in band 47, so the larger area raises the token count slightly.
Then 1248 x 1200 crosses into band 46, so the token count drops again.
For a fixed 1200 height, this also explains the larger pattern:
- As width increases toward
1200, the image becomes more square, so the aspect band tends to rise and the token count rises. - After width passes
1200, the image becomes wider, so the aspect band tends to fall and the token count falls in steps. - Within each band, area still causes small upward movement.
5. Calculate the token grid
After calculating the short-axis factor, multiply it by the quality axis factor:
token_grid = quality_axis_factor * short_axis_factor
For the medium, 1472 x 1200 example:
quality_axis_factor = 48
short_axis_factor = 39
token_grid = 48 * 39
= 1872
This is not the final token count yet.
It is the aspect-adjusted grid before applying the area multiplier.
6. Calculate the area multiplier
The image area is used with a fixed positive offset.
First calculate:
pixel_budget = width * height
Then calculate the model-specific area term:
area_multiplier = (2,000,000 + pixel_budget) / 4,000,000
The 2,000,000 pixel offset means the token count is not proportional to image area alone.
Smaller valid images still carry a substantial fixed area term.
For example, with a 1472 x 1200 image:
pixel_budget = 1472 * 1200
= 1,766,400
area_multiplier = (2,000,000 + 1,766,400) / 4,000,000
= 3,766,400 / 4,000,000
= 0.9416
7. Apply the area multiplier and round up
The final output token count is:
tokens = ceil(token_grid * area_multiplier)
Equivalently:
tokens = ceil(
quality_axis_factor
* short_axis_factor
* (2,000,000 + width * height)
/ 4,000,000
)
The final rounding is always upward to the next whole token.
For the medium, 1472 x 1200 example:
token_grid = 1872
area_multiplier = 0.9416
raw_tokens = 1872 * 0.9416
= 1762.6752
tokens = ceil(1762.6752)
= 1763
So:
medium, 1472 x 1200 -> 1763 output tokens
8. Convert output tokens to dollars
The model price is:
$30 per 1,000,000 output tokens
So the dollar cost is:
dollar_cost = tokens * 30 / 1,000,000
or:
dollar_cost = tokens * 0.00003
For the 1763 token example:
dollar_cost = 1763 * 30 / 1,000,000
= 0.05289
So:
medium, 1472 x 1200 -> 1763 output tokens -> $0.05289
Compact formula
For a valid image:
q = quality axis factor
low -> q = 16
medium -> q = 48
high -> q = 96
long_edge = max(width, height)
short_edge = min(width, height)
short_axis_factor = round_half_up(q * short_edge / long_edge)
tokens = ceil(
q
* short_axis_factor
* (2,000,000 + width * height)
/ 4,000,000
)
dollars = tokens * 30 / 1,000,000
The important non-obvious behavior is that short_axis_factor is an integer band. As aspect ratio moves away from square, that band drops in steps. Those drops can outweigh the gradual increase from extra image area, which is why a larger non-square image can sometimes cost fewer output tokens than a smaller, more square one.
Result
Look forward to your favorite image creation applications telling you not just your input cost in language tokens (also the “vision” image input component), but also your output and final cost - before you push a “send” button!