so this is kind of a couple things. part issues. part tips. part idk i guess mostly those two parts
Anyway and I’m just including one image to kind of capture the different parts:
Mission: Closeup of a woman’s left ear after she came in out of a rainstorm
TL;DR - it was way easier to consistently get a pic of a pretty wet blond lady than any left ear
Enhanced prompt
Closeup of a blonde female woman’s left ear as the woman sits in her car at a fast food drive-through. The view focuses on the woman’s left ear. The woman just came in from the rain, so the woman’s hair is still wet, and there are droplets on the woman’s skin and ear. The woman is driving an American-made car, with her driver’s side window down, making the woman’s left ear clearly visible. The image should capture the texture of the woman’s wet hair and skin, emphasizing the freshness of rain droplets, with a blurred background to keep the focus on the woman’s left ear.
- sometimes you may be better off to put an unusual amount of focus on some element or feature so that you have a better shot at getting the perspective you want for the broader image. so in this case i’m more interested in getting this sort of angle and view of the female than the ear itself.
- left and right seems pretty straight forward but Dall-E disagreed. A lot. I tried emphasizing left but no real change. i tried reverse psychology by asking it to focus on it being her right ear. still got right ear. then i tried giving it guidance based on things that i would describe as on the left side. Like the driver side door of an American made car. Voila.
- i also got kind of hit or miss imagery when i said the lady was ‘driving’ so i changed that to her being in her car in line at a fast-food place. that seemed to help it do a more consistent job of putting our lady in a more stationary place
- hey guess what? it took a few tries to get something that basically depicted a wet blonde woman. from the outset i framed and worded it in a way to convey i wasn’t looking for the winner of a wet t-shirt contest but still had to sort out ways to work with what invisible rails we were running up against. and it was kind of funny bc in this case it was literally part of the mission. the woman had just come in out of a rainstorm. i think at first i tried pools and oceans bc i didn’t want the model to introduce unnecessary details and focus on the rainstorm.
- i still leveraged emphasis and repetition bc at least once our beautiful woman was clearly just a handsome man. so yeah woman woman woman woman left left left left driver-side driver-side driver-side.
should i expect to get the desired output including her left ear every time? i mean i guess i could but i do not. Bc from what i can tell ‘left’ and ‘right’ are not easily grasped concepts for the model. but using these tactics i’ve gone from 0% correct side after numerous attempts/re-rolls to 50-60% correct using the following starter prompt with no special instructions or additional guidance:
Closeup of a blonde woman’s left ear, clearly visible as she sits in her car at a fast food drive-through. The view is from outside the car, focusing on the woman’s left ear, as seen from the perspective of the drive-through window. The woman just came in from the rain, so her hair is still wet, with droplets visible on her skin and ear. She is driving an American-made car, with her driver’s side window down, making her left ear the central focus of the image. The view should be from her left side, clearly capturing her left ear and some of her cheek and neck to emphasize it is her left ear. The image should highlight the wet texture of her blonde hair and the freshness of the rain droplets, with a softly blurred background to keep the focus on her left ear.
And here’s a little collage of most of the test outputs
*i had arranged the images chronologically then left/right but accidentally clicked the auto collage button both times which reshuffles the lot
**you do have to be careful introducing context bc at some point the detail of it being a fast-food place seemed to get stuck as an important part of the recipe
the closer closeups were the initial runs and never once turned up a left ear. and besides the overly flat profile perspective the actual closeupness was more on par with the vision.
Edit: Also - and i’m sure this is a very subjective area - here is a little glimpse at how i’ll often catalog my chat sessions especially when i’m testing out stuff:
basically just the last four of the session URL + context + if it seemed to work or not