Hi @Ameseth (and @ all others)!
While I lack the sophisticated knowledge to fully understand [AI] (and always end up eagerly following links to âa new and very simple method to⊠[AI]â just to be confronted with crazy algorithm-math, jokingly thinking that maybe âAI is writing papers about AI, and ML/data scientists donât want to admit they also donât understand these papersâ) - Iâd say I made up for that, in part, with âexcessive trial-and-errorâ during âlockdownsâ, and especially utilizing the wonderful AI named CLIP (the âoldâ CLIP released now just over a year ago) ârunning localâ.
Iâve since:
- Had an epic brainflick-moment when I realized that âall that hate about gender stereotypical bias about AI on social mediaâ is, in fact, a problem of the human-NLP side of things; namely, the English language used in prompts, which naturally doesnât assign âgenderâ to all-and-every noun. And that I, in fact, could use RuDALLE - AFAIK, a version (âfine-tuneâ [?]) of CLIP that stands out for taking Russian language prompts to generate SO MUCH more precise, highly differentiated output⊠By working on âmyâ side, the NLP prompt, with regard to âswitching my language for input / prompts to a more complex oneâ. (Hold on, this is related to your work!). So, with a car being naturally âfemaleâ in RU, by changing the preceding adjective to wrongfully addressing âcarâ as neutrum or maskulinum (works the same in RU as in DE - high five, fellow German, because this is near-impossible to explain to a native monolingual EN speaker - Iâve triedâŠ) - e.g. âhuebscher Autoâ, âhuebsches Autoâ, âhuebsche Autoâ - (beautiful car) to produce 1. an aggressive-looking sports car, 2. a normal car (for RU âcarâ = fem.), 3. a small european looking car.
This - technically - should also apply for changing how gentle / aggressive / neutral your cat looks.
Wait, how does that apply to the English DALL-E?
- I came across this when feeding CLIP an image of a friend on a bicycle in 2021 (in gradient ascent, so basically âgetting back a text âopinionâ for what CLIP âseesâ in the imageâ: Instead of the typically-spot-on astonishing ânailed it!â response after just a few hundred iterations - CLIP went on a âracistâ anti-German slur. Which to me - knowing AI doesnât have emotions or intentions - sent me into a fit of laughing up to the point of being in tears for this top-class cabaret-style comedy impression that was a) unexpected, b) still understandable âhow [the AI] came to the conclusion, albeit derailedâ and c) thus generated maximum entertainment. I attached a very much excellent example of this; a photo of my âAlexaâ Iâve given to CLIP. And then photoshopped to remove the âtypographic attack vulnerability exploitationâ, i.e. the German text. The difference is⊠Exactly as mentioned in the prior example.
PS: I am sorry this image âlacks professional languageâ with referring to the typographic issue as âCLIPâs OCD reading biasâ - it was made and tailored to a young lady who deeply enjoys receiving images exactly THIS way, and I didnât save another (without annotations) - my bad, and apologies to anybody who might find this âinappropriateâ - that was neither the AIâs NOR my intent! I am adding this as I believe itâs a very useful background info with regard to the CLIPâs tokenizer / generated âopinionâ tokens as they still seem to vaguely apply for DALL-E 2 as via my trial-and-error (see below):
-
So I had to know what happened and found out: It was related to the abundance of German language text seen on the bike frame (a cargo bike), and this top-of-the-tips comedy event was known as a âtypographic attack vulnerabilityâ. And also, CLIPâs dataset is heavily weighted towards âEnglishâ, but is in no way limited to English - CLIP (as in, âthe originalâ, as released âfree to the publicâ by OpenAI [thank you so much, I owe you my Covid-maintained excellent mental health, not even kidding!]), knows a multitude of different languages - including quite a bit of German, and Russian, andâŠ
-
Subsequently, I realized that CLIP ânaturallyâ creates not on German longwords, but any-and-all languages longwords. âProbabantennaâ, âparticleweaving spiderrollercoasterâ, and âartificialintelligence trippyeyes libraryâ include some of âactual, unaltered, as-isâ best_of returned by CLIP models for âlooking at an imageâ (fun fact: The last mentioned was âCLIP looking at a screenshot of the AIâs activation atlasâ ).
-
These âopinionsâ, especially but not limited to âwhen feeding the init image the âopinionâ was based onâ work really, exceptionally well for creating a desired output. Even if itâs a desired output as given by the AI itself, itâs absolutely marvellous.
More directly related rather than background information, with regard to your project:
While âbatâ in German is âFledermausâ, so âone wordâ, in Russian it is, literally translated: âFlying mouseâ, with the space; whereas Fleder+NewWord doesnât necessarily make as sense, but letâs put that to the test. First, though: using RuDALLE, I had really great results with âthe old CLIPâ for making âFlying Ratsâ:
[edit: I had to smash this together because ânot allowed for new usersâ to post more than one attachment⊠Also contains stuff mentioned further below]
I already noticed that together with astonishingly amazing quality AND overall coherence, DALL-E 2 also doesnât seem quite so ârigidâ with regard to concepts that have very few (if any) actual ârepresentationsâ in real life; such as: âsoulâ, or, of course: âAIâ. which, in the âold CLIPâ, basically was represented by âred, monocular visionâ. Very clearly a hollywood-inspired stereotype - and no longer present as such a ârigid conceptâ now, meaning you can no longer count on receiving âa red Terminator-style monocular visionâ just by mentioning âAIâ in the prompt; so âthings might be more complex nowâ, and the hundreds, likely even many thousands of things I made with CLIP - cannot just be applied to DALL-E to receive âthe same, but betterâ. Itâs, in fact, a whole new quality, a whole new level [of awesome].
So letâs put all of the above knowledge to the test with DALL-E 2!
1.: лДŃŃŃĐ°Ń ĐŒŃŃŃ - RU for: âflying mouseâ. Result: #dalle generates mouse. Just mouse. Conclusion: Initial evidence points towards âthis AI know multiple languages again, too!â
2.: лДŃŃŃĐ°ŃĐŒŃŃŃ - the same, just as a âlongwordâ. Interesting result: Itâs creates a flying-[stripped-input]. All are âanimalsâ, two examples:
[edit: I had to smash this together because ânot allowed for new usersâ to post more than one attachment⊠see above]
-
Fledermaus - German for âbatâ. Generates⊠you guessed it: Bats! Perfect bats! One example:
Iris à DALL·E | Fledermaus
-
Bold move: âFledermaus but as a Katzeâ (bat but as a cat): Not sure whatâs going on here, but it looks like this could be fun!
[Edit: âSorry, new users can only add two links to postsâ]. Alas, I will have to post this later, albeit the links being to openAIâs domain; I will once I know I wonât be tripping a spam filter and be classified as âan AI, not a humanâ. Sorry!
-
Flederkatze; a wingedcatbat: The AI appears a bit confused, but with interesting results:
[Edit: âSorry, new users can only add two links to postsâ]
-
FlederKatze; a wingedcatbat with green cateyes and an adorable kittyface and a fluffy tail.
Now AI & I are both confused, but I - personally - totally wanna pet that fluffy batcatâs belly!
[Edit: âSorry, new users can only add two links to postsâ]
-
Finally, copying parts of your prompt for âstyle hintsâ:
FlederKatze; a flying wingedcatbat with batwingfrontpaws, green eyes and an adorable kittyface and a fluffy tail. photorealistic, professional digital art
[Edit: âSorry, new users can only add two links to postsâ]
Those were the best two out of a one-shot attempt [4 generations], with the others having the same issue as you reported - wings on the back.
However, with one having âfront-pawed wingsâ, this is âgoing in the right directionâ - albeit needs more work to refine the prompt, including credits for the attempt, so Iâm gonna stop here.
I am sorry if this is âtoo verboseâ and a wall-of-text, but I genuinely hope that being in-context, it will serve to be helpful for your successful creation of art-as-intended!
Wishing you good luck, success, and most of all - a lot of fun!
Reese (Iris)