I really appreciate that DALL-E tries to be more diverse, i think this feature is imperative.
But i think its not good if DALL-E ignores written descriptions. It would be better in my opinion if DALL-E would keep on the explicit written features and only touches the not explicit written. Like blond hair. The picture should have blond hair, other features… go wild.
It would be nice if DALL-E had:
A Option for “Variations” or “Edit” which keep the picture but changes something globally depending on the instruction.
Like
“Make the woman slightly older”
“Add some weight to the woman”
“Give her a South-European looking”
Edits (on Erasing basis) dont work for this, and for Variations you can’t give any instructions what sort of Variations you want. I noticed it for creating a fantasy creature but here it seems just as relevant.
Maybe a “temperature” scale like on OpenAI davinci for example, where you can choose how much freedom the DALL-E takes from the written description. From, keep all written instructions to just keep the Idea.
I think the aging problem is harder to solve than it appears. For example from Argentinian standards people lets say in the United States at the age of 16-19 look like 20-27 from our viewpoint. I think they need more images to train and also better labelling but i get this costs lots of money and from where they are going to get this huge volume of images in the current data protection paradigm?
That’s why i thought an option to take the whole picture an tell it to make the Person look "slightly older/“older”/“much older” would be useful.
Erasing the face and telling it to appear older/younger results in a complete different face, so that’s not an option.
If you could tell it to get older/younger it wouldn’t matter what standards you have. If you start with age of 30 and it doesn’t fit you can adjust.
With some tries i got this, that’s it more or less but i thing it could count as a proof of concept?
mmmm perhaps the reason is the word elf? Because i am sure even if the models dont understand the concept itself of this fantasy creature i am sure that they can grasp that elves when depicted are younger even if hundreds of years old (what i mean is perhaps models in translating images to numerical numbers the age variable doesnt affect this particular completion? For example a 200 year old age elf may be depicted as a young person in theirs 20s)
Ah okay that’s funny to me the left one looks older than the right one. More “wrinkles”. etc.
I thought my error was in telling it first and second. From my culture left would be considered first but that’s not what DALL-E might do. And so I got it in the “wrong” order.
I thought this was depicting the same character at different ages. And so to me it seemed possible to get the same character at different “ages”. Therefore adjusting a character to junger/older should be relatively easy.
some words may be the problem might be the case with the words older/younger, for example i found in Codex make the image smaller/shorter didnt work as well as lower the images height. So a solution may be an elf in its Infancy, Childhood or Adolescence.
Dall-E absolutelt needs to learn to count! This would have an impact on what you’re trying to achieve here. So, in stead of too many words you can give the exact same description and just change the age e.g. “20 year old female elf princess, black hair, green eyes, blue dress” then “40 year old female elf princess, black hair, green eyes, blue dress” - if you’re working on the same project Dall-E should recognise this and adjust accordingly.
In my case, I create Tarot decks and just getting a simple number of items on an image… well it simply doesn’t work.
Love your work Aline.
I was wondering about that too… “first” and “second” will be different depending on whether you read right to left or vice versa. My guess is that it’ll pick whichever side looks older early in diffusion, and then make it older still. I’d be curious if the older one switches sides in multiple tries and examples. In general I’ve noticed that in many diffusion models there’s a heavy influence of “what does the algorithm see there” in addition to the “what the prompt says” so if first/second is vague it’ll probably go by what it sees emerging.