Issue with Positional Accuracy in Generated Images

I am encountering issues with the positional accuracy of generated images using DALL-E. Specifically, I attempted to create an image of a mum playing with her 2-month-old baby. Despite specifying that the baby should be lying on her back on a play mat, the generated images consistently showed the baby sitting up, which does not align with typical developmental milestones for a 2-month-old.

Additionally, when I tried to generate an image of a baby lifting a blanket to find toys underneath, the images repeatedly depicted the baby under the blanket instead of lifting it to find the toys.

These positional inaccuracies are affecting the usability of the generated images. Is there anything that can be done to improve the model’s understanding of specific positional and developmental context? Any guidance or solutions would be greatly appreciated. Thank you!

Prompt:
An image of a joyful 8 months old girl in a crawling position. Her eyes are wide and excited when she is lifting a colorful blanket to discover a hidden rattle and two hidden balls under the blanket. The scene should radiate happiness and curiosity, emphasizing the baby’s engagement and excitement. The room is baby-safe, with soft, warm lighting, and the background is cozy and inviting, featuring baby-friendly decor. The floor in this room is clean and tidy.

Results:

1 Like

Hi @aiza.tariq

I provide some prompts below:

Predominantly the baby is lying on her back on a colorful play mat, looking up with wide, curious eyes. A joyful mother playing with her 2-month-old baby in a warmly lit room. baby's small hands and feet are moving gently, wearing a soft onesie in pastel colors like light pink or baby blue. The mother, dressed in casual and comfortable clothing, such as a cozy sweater or t-shirt in soft tones like beige or light gray, is sitting next to the baby, smiling warmly, and gently holding the baby's hands. The play mat has soft, bright colors with simple patterns like stars, circles, or animals. Surrounding the baby are age-appropriate toys like a soft rattle and a plush animal within arm's reach. The room has warm, natural light, possibly from a nearby window, with gentle shadows. The environment is cozy and safe, with baby-friendly decor, emphasizing a sense of comfort and security. The focus is on the baby's developmental posture and the loving interaction between mother and child. Predominantly the baby is lying on her back on a colorful play mat, looking up with wide, curious eyes. 3D rendering, 8K quality, ar:16:9

A joyful mother playing with her 2-month-old baby in a warmly lit room. The baby is lying on her back on a colorful play mat, looking up with wide, curious eyes. She wears a soft onesie in pastel colors like light pink or baby blue, with small hands and feet gently moving. The mother is dressed in casual, comfortable clothing such as a cozy sweater or a t-shirt in soft tones like beige or light gray. She sits next to the baby, smiling warmly and gently holding the baby's hands. The play mat features bright, soft colors with simple patterns like stars, circles, or animals. Surrounding the baby are age-appropriate toys like a soft rattle and a plush animal within arm's reach. The room is filled with warm, natural light, possibly from a nearby window, casting gentle shadows. The environment is cozy and safe, with baby-friendly decor emphasizing comfort and security. Focus on the loving interaction between mother and child, capturing the sense of comfort and bonding. ar: 16:9, 3D rendering, ultra-high definition resolution.

An 8-month-old baby girl with soft brown hair in a crawling position on a colorful play mat, wearing a pastel-colored onesie. She is joyfully lifting the corner of a brightly patterned blanket in front of her, revealing a rattle and two colorful balls beneath. The blanket has playful designs like stars, animals, and geometric shapes. The room is cozy and child-friendly, with soft lighting and pastel colors. The play mat is multicolored with squares and circles. Surrounding the baby are plush toys and pillows, creating a warm and inviting environment. The baby is central in the image, focused on the toys she's discovering. ar: 16:9
An 8-month-old baby girl in a crawling position on a colorful play mat, her knees are pressing on a corner of the blanket, joyfully lifting the corner of a brightly patterned blanket in front of her. The blanket is positioned in front of the baby, not covering her. Beneath the blanket are a rattle and two colorful balls, which the baby is discovering with excitement and curiosity. The room is cozy and child-friendly, with soft lighting, pastel colors, and a clean play area. The baby is central in the image, focused on the toys she's revealing, with a warm and inviting environment surrounding her.ar: 16:9

Hi @polepole. Your outcomes look great. However, the prompts will be provided by the user. How should I modify their prompts so that these positional inaccuracies do not arise?

Predominantly: The word “Predominantly” makes DALL-E focus more what you want.
In 1st sample, you can see the first sentence “Predominantly the baby is lying on her back on a colorful play mat”, also repeating it end of prompt is make better.

Position of the elements: When DALL-E creates images, it thinks general thing, for example blankets; blankets are used to cover our body while sleeping or when sitting on the chair to cover our legs. We need to say where located this blanket otherwise it shows over the body or on the legs. In the sample prompts I said “her knees are pressing on a corner of the blanket, joyfully lifting the corner of a brightly patterned blanket in front of her.”. DALL-E knows if it is under the knee, it is not over the baby, also blanket should be “in front of baby.”

If you try several times and DALL-E does not give output what you want, use a different element instead of “blanket” that is not used to cover the baby, for example “long tablecloth” or “large towel”.

Please look at the image below, I remove only the word “blanket” and replaced “long tablecloth” or " large towel" in your prompt, not any other words changed. Exactly your prompt you provided above. And it works,

Sometimes we have problem also back, front, side angle, or vertical, widescreen images. You may visit these three topics for it:

  1. DALL-3 does not seem to understand “from behind”
  2. Aspect Ratio in ChatGPT does not work
  3. Orientation problem for vertical images
2 Likes

@polepole This works very well. Thanks.

1 Like