Teaching the AI a concept it's wrong about?

Heyo! Bit new here so bare with me.

I am trying to get Dall-e to draw my centaur explorer character. However, it frequently gets centaurs wrong. Only about 1 in 10 times do I get an actual centaur. Other times I get a human near a horse or a horse with a portion of a human torso on the shoulder or back.

What I’m trying to figure out is, once Dall-e’s gotten it right, what do I need to include in a prompt to teach it that that was correct and it should avoid straying away from that?

also for the life of me i can’t figure out why is always makes him super huge.

1 Like

Here’s an example of a common failure mode I would like to train the AI away from. The explorer character pictured as a human riding a horse.

You should share your exact prompts so we have some hope of diagnosing the issue.

Edit: Did some tests, it seems as though DALL-E 3 really doesn’t know what a centaur is.

My guess is that there’s just so many more images of people and horses compared to Centaurs it gets confused.

2 Likes

oh, of course, my bad:

This produced the failed image.

In an animated adventure cartoon style, depict a young centaur named Tavyn with long brown hair, leading a jungle safari expedition. Tavyn’s upper half is wearing a khaki safari shirt and an explorer’s hat. His lower half is sleek and black with distinctive white hooves, and he’s equipped with saddlebags. Accompanying the centaur are human men in safari gear, armed with hunting rifles and utility belts. The scene is rich with ancient trees, dappled sunlight filtering through the canopy, and an air of adventurous anticipation. The style should be vibrant, colorful, and characteristic of a classic animated jungle adventure cartoon. Tavyn is a centaur. He is not riding a horse.

This one tends to work well but sometimes I get a horse and sometimes I get Tavyn with a horse head.

In a muted fantasy animation style typical of a jungle adventure cartoon, Tavyn is depicted as a centaur, with the upper body, arms, and head of a human and his lower body has the chest, hindquarters, legs, and tail of a horse. Standing among human men in safari gear, Tavyn is only 25% taller than the humans. He has chestnut hair, green eyes, and wears a khaki safari shirt and shorts, seamlessly blending with his black equine body with white hooves. He is holding a rifle, indicating readiness for the adventure. The scene is set in a lush jungle with towering trees, dense foliage, and soft light filtering through the canopy, creating an air of exploration and mystery.


Both of these came from the working prompt:
In a muted fantasy animation style typical of a jungle adventure cartoon, Tavyn is depicted as a centaur, with the upper body, arms, and head of a human and his lower body has the chest, hindquarters, legs, and tail of a horse. Standing among human men in safari gear, Tavyn is only 25% taller than the humans. He has chestnut hair, green eyes, and wears a khaki safari shirt and shorts, seamlessly blending with his black equine body with white hooves. He is holding a rifle, indicating readiness for the adventure. The scene is set in a lush jungle with towering trees, dense foliage, and soft light filtering through the canopy, creating an air of exploration and mystery.

1 Like

I can back-up that centaurs are still hit or miss in DALLE3… Unfortunately, there’s no way to “train” or “teach” it… I’m sure the next model will get a bit better. DALLE2 was horrible at dragons, but DALLE3 is a bit better…

2 Likes

It’s so much better than it used to be though lol I figured maybe if I found the right thing I could tell it “okay now use this reference ID to make x decision about how to combine the human and horse elements from now on” or something like that.

I am basically trying to put together what would look like a title card for the animated adventure cartoon if someone ever did one based off the books I’m writing lol

1 Like

I tried “a mythological Greek centaur” and thought one of the two images could be a good starting point for further refinement.

The other sheds some light on the issue, maybe?

1 Like

I suspect that part of my trouble is that the more details I try to provide to clarify the appearance of the human half or the horse half, the more it will trend toward thinking of them as two separate entities. So it might be easier to avoid getting a split creature with “a centaur” than “a centaur with long brown human hair, black-furred horse half with white socks and hooves.”

huh. I’ll admit this is interesting. You’d think a model that’s meant to build creative things would be better at creative concepts like this.

I’ll also chime in and say it doesn’t know how to make satyrs well either. This post got me wondering if the results would be similarly difficult with a satyr, since they’re closely related.

I end up with a weird combination of Baphomet and a horse (a 4 legged one). It’s funny, but not quite the mark.

Although, it also makes me wonder if the concept is difficult to generate because of the sheer amount of NSFW image content out there, and how that imagery (or lack thereof) affects its ability to accurately portray content. Satyrs were originally greek myths that had raging boners, sooo, it’s certainly a possibility. Who knows what kind of shit is on the internet with centaurs. Even if it’s not meant to be sexual in any way, we don’t know what constitutes inappropro nor how it’s handled in its training or output image.

It could also be a lot of the imagery of centaurs has shirtless people in them. On some occasions, you can get the model to generate someone shirtless, but I’ve noticed that’s usually a blip, not an easily demonstrable thing you can do. I find this relevant because a lot of imagery on centaurs feature the human half as shirtless, which might genuinely cause problems here. It might have all of that omitted entirely from its training even, explaining why it has problems with the concept. There’s a lot we don’t know, but I have noticed that if any kind of visual framework can involve even harmless nudity like that, things get weird.

Tried seeing if I could make Dall-E make a “bara” character, and it’s pretty much a no-go at this point. Why? Because most of that kind of figure art features either shirtless characters or NSFW characters.

Call it a hunch, but I see similarities here, and I feel like that contributes to why it doesn’t seem to understand the concept very well. Even if you prompt for a character with a shirt on, if all the reference photos (shit in the training data, whatever) typically feature shirtless torsos, it’s not gonna know what to do. Or, it might not have any reference photos / training data entirely because of that, thus limiting its ability to accurately understand and depict the concept. We don’t know.

Wish you the best of luck though!

EDIT: The best Satyr image I could muster without getting frustrated as I typically do when I try Dall-E

Like I said, baphomet + horse ( + a Bard class, apparently)

Looking more like we are visiting a mad scientist’s island of genetic experiments:

The AI decided “Unlike traditional centaurs, this creature is a beautiful amalgamation of horse and human.” was a good prompt refinement to produce not-just-a-horse, so maybe it truly doesn’t understand.

1 Like

In a bright fantasy animation style typical of a jungle adventure cartoon, Tavyn is depicted as a mythological centaur, with the light tan upper body, arms, and head of a human and his lower body has the chest, hindquarters, legs, and tail of a small horse. standing among human men in safari gear. The centaur has chestnut hair, green eyes, and wears a khaki safari shirt and shorts, seamlessly blending with his black equine body with white hooves. He is holding a rifle, indicating readiness for the adventure. The scene is set in a lush jungle with towering trees, dense foliage, and soft light filtering through the canopy, creating an air of exploration and mystery.

this one turned out alright. Still has a few quirks. Oddly if I remove the reference to ‘shorts’ the image falls apart into a human and a horse. Haven’t found a great way around that yet. It seems something about that is helping it figure out where to attach the human torso.

1 Like

This is probably correct IRT NSFW content. If I stop specifying a shirt, it will draw him shirtless. He’s meant to be a civilized regular person though lol not so much a forest sprite so I’ve gotta stick to my guns there. it definitely complicates the prompt though as I’ve noticed depending on the clothes it can certainly cause the human part to “Detach” from he horse part. I do get dinged every now and then for 'content violations on the image gen side. recently I’ve had closer to an 80% success rate with at least getting one ‘acceptable’ image back but now and then I get a ‘nope, didn’t do that’ which I think is content driven.

Incidentally horrible things happen if I ask for a dynamic pose of any kind lol not super surprised there though.

1 Like

A-ha, I was wondering if you were getting dinged with content violations too. That’s usually a signal it’s generating something under the hood based off it’s training data that violates the content policy somehow. Which sounds redundant, until you realize just how much stuff it blocks, and because you can’t see what it produced off the training data, you literally have to guess why. In this case, sounds like Dall-E can’t envision a centaur with a shirt on very easily, and if produces something without one, the theory goes it’ll have a higher chance of being blocked.

And this can just happen regardless of how SFW your prompt is too I’ve noticed. Or at least how sfw the intended image should be. I remember trying to play around with some kind of cat-guy character (like the ff14 miqotes), but every time I specified short teal hair, it produced a hypermasculine dude with hatsune miku pigtails. So, concepts can be wildy different.

Yeah, I still think Dall-E 3 needs a lot of work in these regards. Also, it’s difficult to tell where to place the details of how you want something to look the way it does. If they changed their guardrails a lil bit, this might not be as big an issue.

also thanks @_j for making me laugh so hard I snorted

1 Like