DALLE creating a destinctive fantasy creature

Ameseth · July 19, 2022, 2:54pm

I am freelancing author for a small German pen & paper RPG and illustrations are scarce and getting your idea translated is troublesome. Especially when you have to communicate your ideas with the Artist Illustrating them and you usually have to live with what they get up from your initial description.

You can imagine how thrilled i was to get the chance to try DALLE. What this can do alone for illustrating fantasy rpg stuff is mind blowing

In getting illustrations from fantasy creature i discovered some “problems”

First i got the impression as soon as you ask for fantasy creatures instead of real once the quality of the artworks drops significantly. I guess that’s due to a lot of not so professional fantasy creature depictions.

For example:

to

And horse to unicorn is as simple as it gets

I was trying to recreate a creature i made up for a fantasy pen & paper RPG

This is the creature illustrated by Florian Stitz published by Uhrwerkverlag
Flederkatze

The first try got me:

then i tried to get more jungle cat fur pattern:

This got me the closest (i think):

(5. picture)
But i couldn’t refine it…

Uploading: Screenshot 2022-07-19 163207.png…
(I erased where the webbed wings should be)

Then i tried to rephrase:
(A fantasy creature, a ozelot with bat like webbed wings extending from its paws to its body at the frontlegs, digital art, realistic)

Well, nope

Things i considered an issue doing this:

It would be nice to know how many attempts you have left. It is harder to refine things if you run in a full stop and have to start over the next day (or several hours later)
DALLE is very biased even on poor bats, as good as every picture even when i tried to make it “friendy” got some dark/mean vibe. Fruit Bats are so adorable creatures!
It would be nice to be able to define the mood and/or expression of creatures (characters) in a picture to counter biases
DALLE could use some anatomy lessons To phrase where an how parts connect and that e.g. webbed wings connect from the arms to the body would help tremendously to create fantasy creatures.
Edit function
It would be nice to have not only the edit function of erasing. Erasing the “wings” and telling it to make them different often resulted in no wings at all.
I also tried this with a face on a more expressionistic piece of a woman, she got a face which didn’t fit to my vision of the picture, i tried to tell to chance the face by erasing it, i ended up mostly with no face at all.
It would be nice (again to counter biases) to have a option to edit the mood and/or art style globally.
When i tried to erase everything and telling DALLE to chance the mood i got strange patterns.

I have no illusions that this application (creating fantasy creatures) is very niche, but i hope this gives some ideas which help overall.

antonio.ciolino · July 20, 2022, 12:48am

It’s not niche. And I’m jealous! I don’t have access yet.

jon.oakes · July 20, 2022, 11:43pm

I am also exploring fantasy adventure content. I’ve been having fun learning how to describe D&D rooms so that I can get visuals from them. Looking at your sample I added a few elements that might help you.

My results are not better but adding a bit more source content got to that ‘D&D feeling’ a bit.

“a pen and ink drawing of a D&D creature from the monster manual. An ocelot with bat wings extending from it’s front legs. Professional Digital Art.”

(So just adding a reference to D&D and the Monster Manual captured the rough quality of that style.
Might not be what you’re looking for but I support your efforts!

jon.oakes · July 20, 2022, 11:47pm

And sometimes you strike gold. (At least my kind of gold)

a pen and ink drawing from the Monster Manual of a large furry monster with 3 spiral horns, 5 eyes and 2 large muscular furry clawed arms showing the whole monster, not cropped.

Ameseth · July 21, 2022, 7:33am

Thank you so much for trying it out!

Your D&D suggestion will be a help in the future
But still i cant get that the webbed wings are extending between body and front leg, like webbed wings do.

Artist Florian Stitz; Published by Urwerkverlag

I made a few more attempts with
“a pen and colored ink drawing of a D&D creature from the monster manual. An ocelot with bat wings extending from its frontlegs, no frontlegs. Professional Digital Art.”
and
“a pen and ink drawing of a D&D creature from the monster manual. An ocelot with bat wings instead of frontlegs, no frontlegs. Professional Digital Art.”

But it never does the webbed winged front legs like in the “original”

I was trying to push DALL-E for a new concept. To envision an Artist/Authors mind more than just letting it run wild and giving creative inspiration.

Don’t get me wrong. It is still very amazing, and the opportunists with DALL-E are already boundless. But as someone who writes for P&P Fantasy Worlds sometimes you don’t want “just” something like… but this special thing. And either there is DALL-E lacking to elevate a distinctive vision (out of the ordinary), or i am lacking the correct approach. I guess its a bit of both.

It might be worth to explore this further. How far DALL-E is atm able to create plausible imaginary things (like Fantasy Beast (D&D, Magic the Gathering, Elden Ring)) and this either with a strict template or just running wild.
If there exists a certain monster in a Fantasy world and you want to create a picture with that monster, which is “unknown” to DALL-E i guess that’s hardly possible atm.

Sadly i cant delve further into this since i cant afford the pricing model.
I am so happy I had the chance to test with 50 generations a day, even it was only a couple of days. The fun and what I learned from it is astounding.
Thanks to OpenAI at this point for this opportunity!

That bring me to Templates.
It would be amazing for concept Artist if there would exist templates you could use.
Like for a Monster Design
“Monster Sheet”: Getting it in Dynamic Pose, Front, Side, Back View and a Head closeup.
“Monster Color Variation”: Keeping the same Monster but different Color schemes
This could vary in the Token Costs

jon.oakes · July 21, 2022, 2:29pm

I like the template’s idea. Like what if we could upload a ‘rigged’ model designed in 3d (but projected to 2d) and have ‘bones’ that tell Dall-E what we’re after.

Anyway, I tried this prompt to see if it even knew what a cat’s front legs were and got SORTA close?

A detailed black and white pencil drawing of a cat’s front legs with wings like a bat.

But as soon as I make any reference to ‘the rest of the cat’ the wings pop up on the back.

You’ve found a good example of the limitations of this version of the model. Maybe someone will see the Template idea and implement it in DallE -3

Ameseth · July 22, 2022, 8:45pm

Thank you on testing it too!
Everything is so new, and not working doesn’t equal not possible in many cases.

I am soo curious to see the further development of the AI’s. And especially in the concept art area.
Although its hard to use for distinctive stuff since you can’t define a specific “new” style, or teach it characteristics of a fantasy wold (yet).

dennyroberts · July 25, 2022, 2:32pm

Quick note, when you are erasing stuff and want it to fill in the blank, you have to give it the full prompt (so “A fantasy creature, a ozelot with bat like webbed wings extending from its paws to its body at the frontlegs, digital art, realistic”) and not just what you want it to fill it in with (aka “batlike wings”). That explains why there were basically no changes to those images. Cheers!

Ameseth · July 25, 2022, 3:05pm

Thank you for the note. I wasn’t sure how to do it correctly.

I tried both then and i tried again now for validation, but it has no effect.

Only “batlike wings”

Same text “A fantasy creature, a ozelot with bat like webbed wings extending from its paws to its body at the frontlegs, digital art, realistic”
then:

today:

And slightly altered text:

I also tested various erasing areas.

dennyroberts · July 25, 2022, 3:37pm

Very weird… which part of the image are you erasing? It looks like almost nothing is changing except the pattern on the flank and the webbing between the legs.

Ameseth · July 25, 2022, 3:56pm

I tried various erasings, i didn’t keep them all but:

For this:

I got:

Reese · August 24, 2022, 7:45pm

Hi @Ameseth (and @ all others)!

While I lack the sophisticated knowledge to fully understand [AI] (and always end up eagerly following links to “a new and very simple method to… [AI]” just to be confronted with crazy algorithm-math, jokingly thinking that maybe “AI is writing papers about AI, and ML/data scientists don’t want to admit they also don’t understand these papers”) - I’d say I made up for that, in part, with “excessive trial-and-error” during “lockdowns”, and especially utilizing the wonderful AI named CLIP (the “old” CLIP released now just over a year ago) “running local”.

I’ve since:

Had an epic brainflick-moment when I realized that “all that hate about gender stereotypical bias about AI on social media” is, in fact, a problem of the human-NLP side of things; namely, the English language used in prompts, which naturally doesn’t assign “gender” to all-and-every noun. And that I, in fact, could use RuDALLE - AFAIK, a version (“fine-tune” [?]) of CLIP that stands out for taking Russian language prompts to generate SO MUCH more precise, highly differentiated output… By working on “my” side, the NLP prompt, with regard to “switching my language for input / prompts to a more complex one”. (Hold on, this is related to your work!). So, with a car being naturally “female” in RU, by changing the preceding adjective to wrongfully addressing “car” as neutrum or maskulinum (works the same in RU as in DE - high five, fellow German, because this is near-impossible to explain to a native monolingual EN speaker - I’ve tried…) - e.g. “huebscher Auto”, “huebsches Auto”, “huebsche Auto” - (beautiful car) to produce 1. an aggressive-looking sports car, 2. a normal car (for RU “car” = fem.), 3. a small european looking car.

This - technically - should also apply for changing how gentle / aggressive / neutral your cat looks.

Wait, how does that apply to the English DALL-E?

I came across this when feeding CLIP an image of a friend on a bicycle in 2021 (in gradient ascent, so basically “getting back a text “opinion” for what CLIP “sees” in the image”: Instead of the typically-spot-on astonishing “nailed it!” response after just a few hundred iterations - CLIP went on a “racist” anti-German slur. Which to me - knowing AI doesn’t have emotions or intentions - sent me into a fit of laughing up to the point of being in tears for this top-class cabaret-style comedy impression that was a) unexpected, b) still understandable “how [the AI] came to the conclusion, albeit derailed” and c) thus generated maximum entertainment. I attached a very much excellent example of this; a photo of my “Alexa” I’ve given to CLIP. And then photoshopped to remove the “typographic attack vulnerability exploitation”, i.e. the German text. The difference is… Exactly as mentioned in the prior example.
PS: I am sorry this image “lacks professional language” with referring to the typographic issue as “CLIP’s OCD reading bias” - it was made and tailored to a young lady who deeply enjoys receiving images exactly THIS way, and I didn’t save another (without annotations) - my bad, and apologies to anybody who might find this “inappropriate” - that was neither the AI’s NOR my intent! I am adding this as I believe it’s a very useful background info with regard to the CLIP’s tokenizer / generated ‘opinion’ tokens as they still seem to vaguely apply for DALL-E 2 as via my trial-and-error (see below):

So I had to know what happened and found out: It was related to the abundance of German language text seen on the bike frame (a cargo bike), and this top-of-the-tips comedy event was known as a “typographic attack vulnerability”. And also, CLIP’s dataset is heavily weighted towards “English”, but is in no way limited to English - CLIP (as in, “the original”, as released “free to the public” by OpenAI [thank you so much, I owe you my Covid-maintained excellent mental health, not even kidding!]), knows a multitude of different languages - including quite a bit of German, and Russian, and…
Subsequently, I realized that CLIP “naturally” creates not on German longwords, but any-and-all languages longwords. “Probabantenna”, “particleweaving spiderrollercoaster”, and “artificialintelligence trippyeyes library” include some of “actual, unaltered, as-is” best_of returned by CLIP models for “looking at an image” (fun fact: The last mentioned was “CLIP looking at a screenshot of the AI’s activation atlas” ).
These “opinions”, especially but not limited to “when feeding the init image the ‘opinion’ was based on” work really, exceptionally well for creating a desired output. Even if it’s a desired output as given by the AI itself, it’s absolutely marvellous.

More directly related rather than background information, with regard to your project:

While “bat” in German is “Fledermaus”, so “one word”, in Russian it is, literally translated: “Flying mouse”, with the space; whereas Fleder+NewWord doesn’t necessarily make as sense, but let’s put that to the test. First, though: using RuDALLE, I had really great results with “the old CLIP” for making “Flying Rats”:
[edit: I had to smash this together because “not allowed for new users” to post more than one attachment… Also contains stuff mentioned further below]

I already noticed that together with astonishingly amazing quality AND overall coherence, DALL-E 2 also doesn’t seem quite so “rigid” with regard to concepts that have very few (if any) actual “representations” in real life; such as: “soul”, or, of course: “AI”. which, in the “old CLIP”, basically was represented by “red, monocular vision”. Very clearly a hollywood-inspired stereotype - and no longer present as such a “rigid concept” now, meaning you can no longer count on receiving “a red Terminator-style monocular vision” just by mentioning “AI” in the prompt; so “things might be more complex now”, and the hundreds, likely even many thousands of things I made with CLIP - cannot just be applied to DALL-E to receive “the same, but better”. It’s, in fact, a whole new quality, a whole new level [of awesome].

So let’s put all of the above knowledge to the test with DALL-E 2!

1.: летучая мышь - RU for: “flying mouse”. Result: #dalle generates mouse. Just mouse. Conclusion: Initial evidence points towards “this AI know multiple languages again, too!”

2.: летучаямышь - the same, just as a “longword”. Interesting result: It’s creates a flying-[stripped-input]. All are “animals”, two examples:
[edit: I had to smash this together because “not allowed for new users” to post more than one attachment… see above]

Fledermaus - German for “bat”. Generates… you guessed it: Bats! Perfect bats! One example:
Iris × DALL·E | Fledermaus
Bold move: “Fledermaus but as a Katze” (bat but as a cat): Not sure what’s going on here, but it looks like this could be fun!

[Edit: “Sorry, new users can only add two links to posts”]. Alas, I will have to post this later, albeit the links being to openAI’s domain; I will once I know I won’t be tripping a spam filter and be classified as “an AI, not a human”. Sorry!

Flederkatze; a wingedcatbat: The AI appears a bit confused, but with interesting results:
[Edit: “Sorry, new users can only add two links to posts”]
FlederKatze; a wingedcatbat with green cateyes and an adorable kittyface and a fluffy tail.
Now AI & I are both confused, but I - personally - totally wanna pet that fluffy batcat’s belly!
[Edit: “Sorry, new users can only add two links to posts”]
Finally, copying parts of your prompt for “style hints”:
FlederKatze; a flying wingedcatbat with batwingfrontpaws, green eyes and an adorable kittyface and a fluffy tail. photorealistic, professional digital art

[Edit: “Sorry, new users can only add two links to posts”]

Those were the best two out of a one-shot attempt [4 generations], with the others having the same issue as you reported - wings on the back.
However, with one having “front-pawed wings”, this is “going in the right direction” - albeit needs more work to refine the prompt, including credits for the attempt, so I’m gonna stop here.

I am sorry if this is “too verbose” and a wall-of-text, but I genuinely hope that being in-context, it will serve to be helpful for your successful creation of art-as-intended!

Wishing you good luck, success, and most of all - a lot of fun!

Reese (Iris)

Ameseth · August 24, 2022, 9:01pm

Weeeee down the rabbit hole again!

I noticed a few days ago that i could mix German and English but I didn’t thought about it any further.
But your post gave me a game changing clue. Compound words!!

This is the closesed i ever got: