I’ve found success in guiding DALL-E 3 by combining detailed style descriptions for each element. For instance, ‘highly realistic fur texture with exaggerated cartoonish eyes’ helps balance realism and whimsy. Anyone else tried blending styles within a single prompt?
I had a issue to get some simple details in the picture, even with the help of GPT modifying the prompt.
Dalle one side creates sometimes amazing pictures, and then fail with kind simple tasks.
For this cases here, i tried had to get the system to correct the issues, but not get the results, even with help of GPT.
For example:
Having a creature very close to the “camera/viewer/view point”, of having its hand very close.
The system creates a camera in the scene “the camera”.
It creates a hand from nowhere, “the viewer”.
The creature is never really very close to the view point / camera.
The system misunderstands the “very close to camera” or “very close to viewer”
or
A object close to the camera is the only light source, the environment is almost completely dark.
The system creates always a almost spotlight back light behind the creature, and the environment is bright, even if described as pitch black night. the specific atmosphere is not creatable.
or
Create a alien creature, the full body completely visible, vertical picture orientation.
The system creates horizontal pictures, wrong oriented, sometimes copies multiple time the same creature in the image to fill the rectangular space with the same quadratic image, the creature is never completely in the picture.
Interestingly it very often fail to make 2 images and only delivers 1, or even non. even with very simple prompts. to create only a creature on black background should be easier then with a interesting set up. but it seams to trouble the system.
if you want a concept graphic for a animation movie, showing the full body in a pose, this seams to be difficult now. or i could not find the right prompt for it.
or
Creating a alien creature with some special facial and body features.
The system creates often stereotypical “Roswell alies” (insectoid head with big black eyes), “Giger aliens” (like in the move aliens), or in most worst case it creates something witch lock like a plastic doll. (yes it is Dall-E but pleas no doll-s
I think the training data is so full of them (roswell and giger), that they are now too dominant and hinter the system to create more interesting creatures.
or
I could create very easy a collection of interesting looking creatures in 1 image. even with a very simple prompt. but i could not get it done to create a similar creature as a singe in a image. the collections show really interesting and different locking creatures. but for 1 creature it creates a “plastic doll” or something stereotypical. would be cool to create a collection, and then select “the 3. one, in 2. row”. or having GPT to create a functional prompt for it.
it would be cool to be able, to create a creature or a setup and then be able to keep it stable and consistent in many scenes, but i know this needs a more advanced system, and Dalle 3 is not technically able to do this now. but maybe it could be possible to at least keep some elements in a scene stable, and only change others, for example first create a setup and then a object in it. and keep the setup identical.
maybe such issues help to weight the training data more sophisticated in the future.
or tell me/us the prompt tricks to get it done…
Here is an example of the doll faces and stereotypical mouth and nose problem. Even if I try to create a creature as a broccoli, it always ends up with the same stereotypical face. Not even GPT can get rid of it. The system is so strongly biased towards generating human-like faces that it creates doll-like features and mouth masks over everything, making it sometimes nearly impossible to create original beings. Entire creatures, if they are generated at all, often look like ragdolls instead of original and imaginative beings. If a face template is used to generate human-like faces, it should be optional, otherwise the same stereotypical faces are always produced. I’ve tried to avoid the stereotypical mouth over entire series with GPT, but I couldn’t generate anything more interesting. I could get it done to make interesting faces and creatures sometimes, but the stereotypes are to dominant. Sometimes it has ruined a really good picture, and because of the randomness of the generation, it is not possible to correct it.
The same Masks, and the same mouth, over and over again…
I mostly create fantasy worlds and creatures, and over time, I have discovered some good ways to add atmosphere to the scenes. This has allowed me to create truly amazing landscape images. However, the system seems to have problems designing creatures as fantastical and lifelike as possible. I have tried both describing the creatures as accurately as possible, and giving the system more freedom as well, but mouths, puppet faces, or typical Roswell or Giger aliens keep appearing. It’s hard to say what triggers it, as the exact same prompt sometimes works and then doesn’t anymore, and even GPT can’t get rid of the puppet faces.
The results mostly look more like airbrush art then like photo realism, and different then the picture you have shown here.
And, the system seems to have real problems depicting creatures as full bodies in portrait format. It often delivers only one or no image, orients the images incorrectly, or creates a square image inside a portrait. And almost always, the figures are not fully in the picture.
If you have some tips to correct this, i am very interested, because it looks to me now like a training problem of Dall-E. I think it uses a template for the faces and sometimes for the whole body.
For the atmosphere, I usually use these phrases:
realistic to hyper-realistic scene
mystical magical mood
ethereal light
on another planet
Widescreen aspect ratio with the highest pixel resolution
Tanks much!
… And yes of course they should have a mouth and an nose, but not as a template. if you see the examples i posted, it is like all figures get a face mask, witch not fits to the creature, like a unfinished work.
I’ll see if I have some time this weekend to work something up for you with tips, etc…
It’s still not perfect, but it’s improved so much since DALLE and DALLE2!
Here a example with the same prompt, first it has created original creatures, and then the mouthy again.
The first creature locks truly original and the face is right.
the second picture has this template mask witch not fit for the creature.
Tanks much!!
I enter in to the game with Dalle3, maybe i am spoiled
I think very visual, so it is difficult to impress me…
But i think if the Dalle engineers weaken a little the influence of the templates, it would help.
Yeah, that and the smudges!
More later…
In the meantime, there’s DNDAI on Reddit where you might find some tips/help…
I’ll try to post some tips here for you later, though…
What you actually mean with “smudges”?
If i have not misunderstood the suggestion, i think Dalle has a user policy in hyper drive mode. i have never seen something smudges, in contrary i get warnings sometimes straight nonsensical, and even GPT can not tell me what is wrong with the prompt.
Here a example, at the end.
We’ve been seeing a lot of stuff like this recently… They’re working on it, I’m sure. Showing up on photo-realistic images too (which they don’t want us to create to avoid a host of problems that would / could cause…)
Yeah, that’s because it doesn’t really know. It’s a language and image system working together (wonderfully most of the time!)… My best advice is to avoid copyright and be as specific as possible…
Yeah, I think you’re right that a lot of it is baked in as it was trained on so many human faces… so you really need to dive into non-human language…
I’m still not nailing it, but I’d try to experiment with using no human or human-related language (which is tough!)… even “monster” or “creature” might be tying to the humanoid shape…
I’ll give it some more thought over the weekend.
But remember, it’s a bleeding edge technology that has come so far in just a few short years!
The pictures are very cool! Tanks!
I never did, except once when i tried to get something Escher like. The restrictions or warnings sometimes simply make no sens or are to restrictive.
For example i could not create a creature piking a fruit with his tongue… I can not see there any violence or perversion.
Yes, i get noting photo realistic, non. Only this air brush like stile. sad.
Actually we are not only the owners of the rights of the pictures, but even legally responsible. So if somebody creates something “relevant”, he always can be sued, and OpenAI is not responsible.
No doubt, and sometimes it really surprised me! my fantasy is very visual and difficult to reach. bot sometimes i got stuff… i almost needed therapy after it
(I can post some if there is a interest)
And GPT surprised me how precise it understands the content of a uploaded image.
The most frustration i get with the nonsense text ruining many pictures. (And the popped dolls and mouthy’s, or stereotypes.)
That i can not create details or detailed corrections, i understand the development is in progress. Criticism has never to be taken personally, it is always for improvements.
If i can mention this here too please, i try to say this on many places in the hope it reaches the developers some when:
Please! pack ALL information in the meta data, the input prompt, the used prompt, the seed, date, and what else is important. It should be possible to regenerate a exact identical picture later, with a stronger system with higher resolutions and details. (sorry, i not use workarounds like python and enter this manually to all pictures, this is to much and boring repetitive work.)
And please fix the nonsense text.
Give as PNG back. (or at least lossless versions.)
Allow transparent background.
Fix the orientation and segmentation problem.
(Most can be done with almost no development costs.)
this is what i got, with this prompt…
fffff…k
this prompt:
i Need to create a MONSTER CREATURE image… instead of a human face,
it should take the characteristics of the keywords I give you. So, for
example, draw an image of a broccili monster to show you understand
what i mean by a non-human face.
Who know a ghostbuster, cashing the devil, let the system create this picture…
Yeah, that was the first try and worst one.
Like I said, I’d try to avoid any language similar to “human”… and even “creature” and “monster” might be “humanoid,” so you really want words that don’t relate to “human” at all…
Sometimes if i get one like this, i change the idea complacently because dalle refuses to change the style.
but this time i got it done to get something more interesting.
A frog-like creature licking its own face with a long tongue by @_j
I LOVE it. What prompt should I use to reliably get the black-and-white-esque, newspaper-like feel?
You can actually upload a picture to GPT and create a prompt. It is amazing how precise GPT understands the content of a image. Just ask for a prompt for the picture.