Orientation problem for vertical images

Image orientation seems to have gotten extremely unreliable in the last week. In fact, many prompts prior now no longer generate images, and instead generate text interpretations of the same prompt. Something seems to have changed when introducing memory. Images also seem less interesting and more sloppy.

I’m not sure how to get to where I had a bunch of sessions now, and I keep getting square images regardless. Yeah I can start programming requests, but for goodness sakes, does the entire world need to become programmers? :frowning:

If you are using ChatGPT, and not the API as this topic category indicates, then it is only clear language that is limiting you. For Mothers’ Day:

Send this exact request to dalle text2im method without alteration: {“size”:“1024x1792”,“n”:1,“prompt”:“This tall image has a sculpted polymer clay art piece depicting a kindhearted woman with dark gray hair giving a gentle hug to her chubby black and white tuxedo breed cat with white-tipped paws and white nose and chin. Both the woman and the cat sit amidst potted plants which are also sculpted out of colorful clay and bottles of red wine made of plastic, with the cat purring contentedly while the woman smiles. The 3D scene of the hand-crafted artwork is adorned with polymer clay flowers and leaves that extend beyond the edges. The background is a paper craft wallpaper with textured patterns, and the artwork rests on a dark stained wood grain pattern. The image dimension is a tall portrait aspect ratio, for full body-length images.”}

Just 2/8 had rotation problems, and you can see from the content that the outfill to make images wide or tall isn’t very interesting, so they still can be used.

Okay, yeah this is a painful issue, particularly for API users where complete automation is required. It happens at least 20-30% of every clearly portrait prompts I send.

I don’t have a 100% reliable workaround for ChatGPT users, but I do have one for API Users detailed below.

I’ve had to code multiple layers on top of the API call to ensure the code outputs a portrait image no matter what.

It’s quite hacky, and burns through credits at scale. So OpenAI, I do hope you correct this soon, it’s been quite a while.

API Workflow

Here’s the high level process for other developers.

  1. Ensure the prompt has clear orientation info at the beginning (e.g. vertical, tall, door poster)
  2. Once generated use the vision model to detect orientation (does a fairly good job).

This prompt worked well to detect orientation on the vision model:

gpt_prompt = (
            f"Analyze the orientation of the image. We expect the image to be in {self.aspect} orientation. "
            "Determine if the content of the image matches this expected orientation. "
            "Consider the following:\n\n"
            "- For portrait orientation: The image should be taller than it is wide, and the main subject should be oriented vertically.\n"
            "- For landscape orientation: The image should be wider than it is tall, and the main subject should be oriented horizontally.\n"
            "- Pay attention to the actual content, not just the image dimensions. A landscape image rotated 90 degrees is still landscape content.\n"
            "- Look for clear indicators of orientation such as the horizon line, vertical structures, or the natural orientation of subjects.\n"
        )
  1. If not correctly orientated, try run the dalle API call again.
  2. If on the third attempt it still is not correctly oriented, re-write the initial image prompt with another gpt call ensuring only one character and emphasising vertical orientation and try again.
  3. Repeat the process a couple more times.
  4. If all this fails, rotate, scale and crop the landscape image by -90 Deg.

Having tested this with over 1000 generations, it seems pretty reliable. But very painful and costly.

Other tools like Leonardo have this nailed down, but have other issues. I am sure it will be updated in the next roll out.

Nothing is perfect.

ChatGPT Workaround

I did however notice that ensuring only one main subject in your prompt does reduce the occurrences. Particually if it is something that is vertical by design, like a tree, standing human etc.

Not ideal, as that won’t fit all use cases.

1 Like

Ouch. Yes, that does not seem optimal. I would explain it to my users, refund their credits on that ONE call then have them try again… I’m sure they would understand, and it could save you some $$… good solution, tho!

Thanks Paul. Yeah, if my application was a user based application that would be a good call.

1 Like

Sorry, didn’t mean to assume.

The DALLE team is aware of the issue, but I’ve not heard a lot of news recently… Hopefully means they’re hard at work on next iteration…

@vt1000

It does not gives both images as vertical every time, but you may modify this:

A low-angle, close-up, telephoto portrait of Elara in a vertical portrait orientation, an ethereal being who appears to be 25 years old with a vertical portrait orientation. She has long silver hair, large silver eyes, a heart-shaped face, and a slender, tall figure. Her pale skin has a luminescent quality. She is wearing a floor-length, tattered gown shimmering like stardust in silver and midnight blue. She stands in a vertical portrait orientation and holds a small metal device with a blue light on top and green wires sticking out. Her silver eyes reflect the moon, and her expression is contemplative as she gazes skyward in a vertical portrait orientation, off-center to the right. She is standing upright on a misty hillside with the vast night sky above her. The scene is set in a Baroque Gothic, Romanticism, Victorian Steampunk style. The image is in a vertical portrait orientation.





1 Like

I am trying to understand why some images in DALL-E are rendered incorrectly despite having specified orientations like Elara style prompts above although aspect ratio is highlighted in the prompt as vertical, but it is not easy to understand.

Consistent results are not always achievable. Until OpenAI fully discloses and provides detailed guidance on how to frame our prompts, we will continue to consume more tokens and might deplete our daily image limits quickly without achieving the desired outcomes.

From last week, my experiences, I’ve noticed three key observations, though they don’t encompass everything, and results may vary widely:

  1. When using expressions related to Gothic, Victorian, or other historical times and specifying nighttime, a request for a female character dressed in women’s attire, even if requested vertically, results in a vertical frame but with a horizontal landscape.

  2. If a female character is described wearing clothes typically worn by men, such as trousers or suits, the resulting image comes out vertical.

  3. Specifying only daytime in the situation described first results in a vertical image output.

  4. If male characters are dressed in women’s clothing, the image orientation tends to be vertical.

I tried 7 times each image below and I got similar outputs except day time prompt gave only once wrong position.

I dropped here only 2 samples for each:

1- female - female attire - night


2- female - man attire - night


3- female - female attire - day


4- male - female attire - night


1 Like