Cropping Images as they are Being Generated

First, this is my first time posting on this forum so I apologize in advance if I do something wrong.

Is it possible to add something to a prompt that will automatically crop an image as it is being generated? For example, if I create an image it may turn out as the image on the Left. However, I would prefer that the output-image be more like the image on the right.

I know there are plenty of ways to crop an image after it has been created, but I would like to be able to crop the image “On the Fly.”

Does anyone have something I can include in the Prompt to give me my desired results? Thank you.

The AI model cannot go back to revise the output of an image itself as it is being generated. Thus, literal “crop”, or “zoom in more” would need to be an additional turn and prompt made with the image as input to the “edits” endpoint if unsatisfactory, to regenerate another based on vision of the first.

Here are some hints about observed behavior with the gpt-image-1 model:

  • seems to generate from top-down by the behaviors seen;
  • square images seem overly zoomed-in and can have the top cut off;
  • long objects (a sword, for example), the AI will not correctly anticipate the needed length and the object will be truncated;
  • model behavior changes, with no courtesy of any method to detect when your application may be broken;
  • dall-e 3 definitely generates first a square image and then expands it; gpt-image-1 may have similar training that has it anticipating “expansion” in its framing, or now some additional work done that tries to make square images without symptom.

So the key here is to add talk about the framing of the person - how much space there is above their head in particular, besides that they are seen in full body length in a tall portrait aspect ratio image in the prompt.

However, in trials, it seems nearly impossible to get the head to meet the top.

prompt = r"""
Create this image where a person occupies the full height of the photograph.
* Important: Plan composition and generate for hair and head of person to meet top of image, no top margin or padding

# Scene Brief

Create a single, photorealistic, editorial-style fashion portrait of a young woman outdoors by a white ranch fence under a leafy tree, summertime, country-chic vibe. Soft golden-hour daylight, shallow depth of field, creamy background bokeh. No text, logos, or extra people.

# Subject — Identity & Grooming

* Female, early–mid 20s, light skin.
* Long blonde hair, loose soft waves, middle part; hair falls to mid-chest on each side; light breeze hair movement.
* Makeup: natural/clean beauty; light foundation, subtle contour, soft pink lip; eyes lightly defined.
* Expression: calm confidence; slight closed-lip smile; direct eye contact with camera.

# Wardrobe — Exact Items & Colors

* **Top:** red-and-white small-check gingham blouse, cropped, deep V neckline, tied or knotted at the midriff; sleeves rolled to 3/4 length.
* **Bottoms:** faded blue high-waisted "daisy duke" denim short-shorts (classic 5-pocket), clean finish, no rips.
* **Shoes:** red high-heel sandals with ankle strap and open toe.

# Pose & Body Language

* Full-length standing pose, weight on right leg.
* Left leg crosses softly in front of right at the ankles.
* Torso relaxed with a gentle S-curve; subtle arch in lower back.
* Right arm extended to the side, resting lightly along the **top rail** of the fence; left arm relaxed with fingertips touching/near the fence.
* Fingers natural; shoulders relaxed; head level.

# Environment & Props

* Location: pastoral/farm setting. Prominent **white wooden three-rail fence** running horizontally behind subject.
* Large mature tree trunk just behind/left of subject; leafy canopy provides open-shade dapple without hard spots.
* Foreground grass near the fence line; distant soft meadow and additional white fencing rendered **out of focus**.
* No buildings, cars, or other people in frame.

# Lighting & Atmosphere

* Time: late afternoon golden hour.
* Key: soft, diffused open-shade light from above/left (tree canopy acting as scrim).
* Rim: gentle warm edge light on hair from behind/right.
* No harsh shadows on face; balanced exposure; high dynamic range; natural skin tones.

# Camera, Lens & Color

* Camera height: about subject’s waist/torso for a flattering slight-up perspective.
* Focal length look: **85mm** equivalent.
* Aperture look: **f/1.8–f/2.2** for shallow DOF and creamy bokeh.
* Color grade: warm, filmic (think Portra-like), muted greens; reds stay rich but not oversaturated; denim true blue.

# Subject Framing, Aspect Ratio & Spatial Rules

* **Aspect ratio:** **2:3 vertical** (e.g., **1024×1536**).
* **Framing:** full-body, close crop zoom in, subject centered on vertical axis.
* **Headroom:** no empty background space above the top of the head; head immediately meets image top (do **not** crop hair).
* **Footroom:** about **10%** of frame height below the soles of the heels (all of both shoes visible).
* **Side margins:** occupied by environment and background.
* Fence top rail intersects just below the subject’s elbows; rails run perfectly horizontal.
* Background intentionally soft; tree trunk remains recognizable on the left third.

# Rendering Quality & Style

* Ultra-photorealistic, crisp micro-detail on fabric weave and hair strands.
* No painterly textures, no HDR halos, no exaggerated skin smoothing.
* Natural proportions; correct limb counts; clean shoes; intact finger anatomy.
* Output a single image.

# Negative / Avoid

* No text, watermarks, or signage.
* No extra people, animals, or vehicles.
* No hat, sunglasses, bracelets, or large jewelry.
* Do not crop the head, hands, or feet; avoid Dutch angles.
* Avoid wind strong enough to obscure the face or clothing.

# One-Shot Prompt (compact summary)

“Photorealistic full-body portrait of a young blonde woman in a red-and-white gingham cropped blouse, blue high-waisted denim shorts, and red ankle-strap high-heel sandals, standing by and lightly leaning on a white three-rail ranch fence beneath a leafy tree; calm confident expression, direct eye contact; late-afternoon warm light, shallow depth (85mm, f/1.8 look), creamy bokeh, natural skin tones; pastoral meadow and distant white fence blurred in background.”
"""

Without the additional “important” line at the top, this framing is very consistent:

The “important planning composition” line added results in a bit more zoom in:

Go extreme, replacing two lines:

  • Important: This composition is framed so that the top of the person’s head is cropped and cut off at the level of the nose.
  • Headroom: no empty background space above the nose level; head is cropped off at nose and ear level (no eyes seen).

“Cropping images as they are being generated” thus, at least, is something that can be instructed.

So this is a “good luck” getting the top framing of the subject in the composition just as you want, but I was able to get the composition just shown repeatedly.

Thank you very much for your assistance. It gives me a new direction to try.

-WVM