Why GPT image API crops original? Please support flexible aspect ratios

_j · April 28, 2025, 8:26am

You have control over the 32 bit RGBA being sent, and even masking to outfill into that (altho gpt-image-1 mask seems a big lie)

You also have the ability to restore an aspect ratio by crop instead of delivering an expansion - or allow the selection of either.

Since I seem to have at least a bit of imagination, I’ll apply that as a bit of AI prompting, 1000 tokens input, wanting some reusable code. No guarantee that it actually goes the optimized budget route as intended, because vettiing AI writing is actual labor.

Let’s break down the requirements and design the function step by step:

Requirements Analysis

Input: Arbitrary-sized PIL image.
Output:
- A new PIL RGBA image, resized and padded to one of the allowed sizes:
  - 1024x1024
  - 1024x1536
  - 1536x1024
- Metadata:
  - The output image’s dimensions (should be one of the above).
  - The crop area (box) within the output image that contains the actual (non-transparent) content, so the original aspect ratio can be restored later.
Algorithm:
- Resize: Scale the input image so that it fits entirely within one of the allowed sizes, maximizing the image area, but not exceeding any dimension.
- Choose Output Size: Pick the allowed size that is closest to the input aspect ratio, with a bias toward the square if the input is closer to square than to the other aspect ratios.
- Upsize: If the input is smaller than the chosen size, scale up so that at least one dimension matches the target, and pad the other with transparency.
- Center: Center the image in the output canvas.
- Metadata: Return the crop box (left, upper, right, lower) of the actual image content within the output.

Algorithm Design

Allowed Sizes:
- (1024, 1024)
- (1024, 1536)
- (1536, 1024)
Aspect Ratio Matching:
- For each allowed size, compute the aspect ratio difference from the input.
- If the input aspect ratio is closer to 1:1 than to the other two, pick the square.
- Otherwise, pick the closest.
Resizing:
- Compute the scale factor so that the image fits entirely within the chosen size (no cropping).
- If the image is smaller, scale up so that at least one dimension matches the target, but do not exceed the target in any dimension.
Padding:
- Create a transparent RGBA canvas of the target size.
- Paste the resized image centered.
Crop Box:
- The crop box is the rectangle within the output image where the actual image is (i.e., where the non-transparent pixels are).

Implementation Plan

Function: resize_and_pad_for_api
Arguments: img: Image.Image
Returns: tuple[Image.Image, dict[str, Any]]
Use only built-in types and PIL.

Implementation

from typing import Any, Tuple, Dict
from PIL import Image

def resize_and_pad_for_api(
    img: Image.Image
) -> Tuple[Image.Image, Dict[str, Any]]:
    """
    Resize and pad an image to fit one of the allowed API sizes (1024x1024, 1024x1536, 1536x1024).
    The image is scaled to fit entirely within the chosen size, centered, and padded with transparency.
    Returns the new RGBA image and metadata including output size and crop area of the actual image content.
    
    Args:
        img: Input PIL Image (any mode, any size).
    
    Returns:
        (output_img, metadata)
        output_img: PIL Image in RGBA mode, one of the allowed sizes.
        metadata: dict with keys:
            - 'size': (width, height) of output image
            - 'crop_box': (left, upper, right, lower) box of actual image content within output
            - 'scale': scale factor applied to input image
            - 'pasted_size': (width, height) of the resized image before padding
    """
    allowed_sizes = [(1024, 1024), (1024, 1536), (1536, 1024)]
    w_in, h_in = img.size
    aspect_in = w_in / h_in

    # Compute aspect ratios and distances
    def aspect_dist(target):
        w, h = target
        aspect = w / h
        return abs(aspect - aspect_in)

    # Bias toward square if input is closer to square than to the other two
    aspect_square = 1.0
    aspect_1024_1536 = 1024 / 1536
    aspect_1536_1024 = 1536 / 1024
    dists = [abs(aspect_in - aspect_square),
             abs(aspect_in - aspect_1024_1536),
             abs(aspect_in - aspect_1536_1024)]
    min_dist = min(dists)
    if dists[0] <= min(dists[1:]):
        target_size = allowed_sizes[0]
    else:
        # Pick the closest of the two rectangles
        target_size = allowed_sizes[1] if dists[1] < dists[2] else allowed_sizes[2]

    w_out, h_out = target_size

    # Compute scale to fit input entirely within target
    scale = min(w_out / w_in, h_out / h_in)
    new_w, new_h = round(w_in * scale), round(h_in * scale)

    # Ensure at least one dimension matches the target exactly (edge-to-edge)
    # If rounding caused both to be less, bump up the larger
    if new_w != w_out and new_h != h_out:
        # Find which dimension is closer to target, and adjust scale
        if abs(w_out - new_w) < abs(h_out - new_h):
            scale = w_out / w_in
        else:
            scale = h_out / h_in
        new_w, new_h = round(w_in * scale), round(h_in * scale)
        # Clamp to target size
        new_w = min(new_w, w_out)
        new_h = min(new_h, h_out)

    # Resize image
    img_rgba = img.convert("RGBA")
    img_resized = img_rgba.resize((new_w, new_h), Image.LANCZOS)

    # Create transparent canvas and paste centered
    out_img = Image.new("RGBA", (w_out, h_out), (0, 0, 0, 0))
    left = (w_out - new_w) // 2
    top = (h_out - new_h) // 2
    out_img.paste(img_resized, (left, top), img_resized)

    # Crop box: where the actual image is
    crop_box = (left, top, left + new_w, top + new_h)

    metadata = {
        "size": (w_out, h_out),
        "crop_box": crop_box,
        "scale": scale,
        "pasted_size": (new_w, new_h),
    }
    return out_img, metadata

This function is ready to be imported and used.
It is efficient, concise, and meets all requirements.

Topic		Replies	Views
Your DALL-E problems now solved by GPT-4o multimodal image creation in ChatGPT? Community dalle3 , gpt-4o-images	49	9511	April 8, 2025
How do I replicate browser interface dalle 3 gpt behavior using the API? Community dalle3	20	1958	March 18, 2024
Using image URL in images/edits request API dalle2	54	23258	February 6, 2024
Huge escalation in input tokens when using code_interpreter API	11	178	May 26, 2025
Dalle3 prompt to generate pencil sketches keeps including pencils in image Prompting dalle3 , dalle , dalle3-bugs	27	8125	July 2, 2024

Why GPT image API crops original? Please support flexible aspect ratios

Requirements Analysis

Algorithm Design

Implementation Plan

Implementation

Related topics