Recognizing and working with existing images

I’ve always wondered why the image generators say they cannot take an image and add to it directly…rather than just approximating something similar to it. After all, it can assess/analyze images…so I’m wondering if there’s a way to get them to do it. Here’s an example: I had it generate a simple spherical creature that to use for the “original” or “default” version in an evolution simulator I’m compiling.


Not too bad. Although on a side note, this is worth pointing out: I tried asking what the motivation was for adding the two baby creatures side-by-side, and the generator GPT denied that they were even supposed to be mini versions of the creature at all…claiming they were just ideas for body parts or something.
Anyway, it seems it should be simple enough for the technology to assess the precise image, and keep all factors constant while changing minor things (like make it taller and more oval…or a different color, etc.). But when attempting to replicate these creatures for example, none of the attemps even come close. Are there any tricks to influence success here?