Exploring 4o Image Generation: Aspect Ratios, API Costs, and Prompt Efficiency – Let’s Discuss!

Hi everyone,

I’ve recently started experimenting with 4o image generation and noticed some interesting changes compared to DALL·E 3. I’d love to hear your thoughts and experiences on the following topics:

  1. Aspect Ratios:
    In DALL·E 3, we had standard aspect ratios like 16:9, 9:16, and 1:1. However, in 4o, it seems to have shifted to 2:3, 3:2, and 1:1. Has anyone been able to achieve other aspect ratios successfully?
    In DALL·E 3, we often saw “fake” aspect ratios where a white border was added, and the actual image had a different ratio. Is this still an issue with 4o?
  2. API Costs:
    Have there been any updates or changes in API pricing for image generation with 4o? If so, how are you optimizing costs while maintaining quality?
  3. Prompt Efficiency:
    With DALL·E 3, the optimal prompt length seemed to be between 64 and 128 characters, with anything longer often ignored. Negative prompts also didn’t work effectively. In 4o, this seems to have changed—does anyone know if the optimal character length has increased? Are negative prompts now more impactful?
  4. Additional Prompt Tips:
    Any new insights or tips for crafting effective prompts in 4o? I’d love to learn from your experiences!

Looking forward to hearing your thoughts and insights! Let’s share ideas and make the most of what 4o has to offer.

  1. No other ratios. 1024x1024, 1024x1536 and 1536x1024 (3:2). Few unfilled area errors, but instead cropping from being too zoomed in.

  2. No updates

    • a. Correction: Dall-E 3 supports 256 tokens internally, far more than 128 characters
    • b. The input is now both prompts, images, and conversational context - and proprietary
    • c. (this forum is bad at nested markdown)
  3. Use a chain of thought of having the ChatGPT AI write out everything that could possibly be needed in an image, to then finally send to the image make its trigger - this writing is observed by the image creation and significantly improves following.

1 Like
  1. Aspect Ratios: Interesting that no other ratios are available yet but with no other ratios you meant that there is still 16:9 available or only 2:3?. The cropping issue when zoomed in makes sense—have you found any tricks to avoid that, or is it just trial and error?
  2. Good to know DALL·E 3 supports up to 256 tokens internally—way more than I thought! Do you think 4o handles longer prompts better, or does it still cut off at a certain point?
  3. The idea of using ChatGPT to write out everything that could possibly be needed sounds smart. Do you have an example workflow for this? Would love to test it out myself.

You’re asking for a richly detailed, deeply immersive written scene set in a crowded traditional Japanese festival, rendered in hyper-realistic illustrated style with 3D depth, and just a faint suggestion it’s not a photo. You want me to describe this scene fully in writing, then issue a brief trigger prompt to the image generator using the surrounding context as guidance.


:sparkles: Hyper-Realistic Scene Description:

The scene unfolds in the golden twilight of a late summer evening, where a vibrant Japanese matsuri (festival) is in full swing at the grounds of an ancient Shinto shrine nestled in a valley between cedar-draped mountains. The sky glows in rich tones of violet and vermillion, streaked by the last embers of the setting sun behind curved temple roofs silhouetted like brushstrokes.

Paper lanterns—thousands of them—float in the air on delicate strings overhead, swaying slightly with the motion of the crowd. They emit a warm amber light that bounces off the lacquered wood of food stalls and the shimmering silk of yukatas worn by festival-goers. Each lantern bears unique calligraphy: some prayers, some vendor names, others tiny haiku. The interplay of natural twilight and artificial illumination creates a cinematic chiaroscuro across the faces of the bustling crowd.

In the foreground, a young couple in matching indigo yukata share a skewer of yakitori, caught in a moment of laughter—tiny droplets of sauce cling to their lips, their flushed cheeks lit warmly by the lantern glow. Beside them, an elderly man with a weathered face and hunched posture slowly turns a hand-crank on a ramune soda stall, his eyes twinkling as he watches children leap for the marbles trapped inside the glass bottles.

To the right, a festival game stall with wooden walls painted in fading red-and-gold motifs hosts a ring toss game, where a girl with a red camellia in her hair is mid-throw, her brows furrowed, arm extended, frozen in determined motion. Her brother beside her clutches a bag of goldfish in a clear plastic pouch—the fish suspended like living jewels.

Past them, taiko drummers line a raised stage draped in white-and-crimson cloth, their arms high above their heads as they prepare to strike in perfect unison. The motion blurs their sleeves slightly, their rhythm echoing like thunder through the night. The crowd pauses, reverent, as the sacred lion-dance procession begins snaking through the square—acrobats inside the ornately detailed shishi-mai costume twist and leap, the lion’s glass eyes glinting under the lantern light.

Overhead, plumes of slow-burning fireworks unfurl like giant chrysanthemums in the indigo sky, their reflections caught in the watery surface of a nearby koi pond surrounded by low stone lanterns and hanging wisteria. The entire festival scene feels alive with motion, scent (grilled corn, incense, roasted chestnuts), and emotion—joy, nostalgia, reverence, and celebration, all flowing together like threads in an intricate brocade.

Despite the photo-realistic precision—individual strands of hair catching the light, textures of fabric rendered with tactile accuracy, and the gleam of condensation on cold drinks—there’s just the faintest hint of painterly brushwork in the soft-edged shadows and haloed lantern lights, giving away that this is, in fact, a masterfully detailed illustration.


:bullseye: Image Generation Trigger Prompt:

“hyper-realistic illustrated Japanese traditional festival at twilight with dense crowd, yukata, lanterns, food stalls, taiko drummers, and fireworks; 3D depth and just a slight painterly effect”

image created


Then, the result of taking that big writing assignment “prompt”, and pasting it into a new “generate image” chat session alone:

You can observe the ability of “observation” going on here, and the resulting quality, depiction of particular elements, and instruction-following.

1 Like