I recently signed up for Plus, as I wanted to use ChatGPT for creative worldbuilding, storytelling, and character-driven art.
Unfortunately, the experience has been draining, demoralising, and I feel as though much of my time (and subscription fee) has been wasted.
However, I understand that the service of combining an LLM with AI image generation is relatively new, so I’d like to provide my feedback as to what pain points I, and many others, have experienced, and what is required to alleviate them.
1. Style Selection (models)
Technologies like Automatic 1111 and Invoke AI are locally installed, but allow you to download specific models (safetensor files), such as Anything V3 and Dreamshaper, to focus on the desired look for the generated images and keep them consistent.
With ChatGPT’s image generation, we have art styles that change wildly from one picture to the next, thus having to regenerate the image (re-roll) several times. Unlike Stable Diffusion, we can’t pick preset models.
As a result, terms like, “cel shaded anime with realistic proportions,” becomes highly ambiguous. Sometimes we’ll get the style we want, other times, “realistic proportions,” is translated into, “realistic style,” so the picture is completely the opposite of what was asked for (realistic vs anime). We’ll sometimes get the child like anime of oversized heads and smaller bodies, or even the chibi style, neither of which we wanted.
It sometimes takes me 10 image generations and feedback to 4o before we finally get the style I was looking for, and then it can just change on the next try at any time, despite 4o trying to ‘lock in’ the style.
2. Ineffective Content Filter
I understand that you want to err on the side of caution when offering such a powerful public service that could be open to many serious issues (legal issues, copyright infringement, harmful material generation), but these content filters are, at times, insane.
I’ve had requests for sunny fields with meadows, or teddy bears shut down as violating the content guidelines. I’ve had images of a humanoid frog character that transformed into a more powerful version suddenly blocked because it caused a violation (I have no idea how).
And if you try telling a story that involves a female character, the warning flags go up faster as we are suspected of, what 4o calls, “fetish-adjacency.” In fact, 4o repeatedly tells me that whatever AI you use to handle image moderation is not capable of understanding the context or nuance of my requests or the images it generates.
I’d go further and say it almost feels like image moderation is being handled by a single python script with poorly designed conditional logic, so you can get blocked and scolded for reasons no one, not even 4o, can figure out.
3. Ineffective Inpainting
The inpainting function regenerates the entire image instead of only changing the part I select with the brush. Sometimes an image is almost perfect, but a female character’s hand (slender, delicate fingers) is suddenly merged with another character in the scene, turning her hand into a vine, or a frog’s hand, or a bird talon. Because the current version on inpainting re-rolls the entire image, one part might get fixed, but then another part or more now has an error, and re-rolling that part leads to more errors.
It feels like two steps forward, two steps back. Then I’ll vent to 4o about how ridiculous this is, and it agrees and tries its absolute best to help me, but there’s only so much it can do with the current system.
Conclusion
When we combine the three listed factors together, it roughly translates into:
- Generate image
- Style incorrect, or merged limbs, or blocked by filter
- Try again directly, ask AI for rephrase of prompt, or inpaint.
- See number 2.
- See number 3.
- Feel sad, but build up a little hope that you can somehow succeed in your task.
- Repeat steps 1-6.
And this happens over and over again until the customer feels demoralised, drained, ripped off, and laughed at.
The saddest part is this. We creative types know that there is a technology we could use that would surpass any AI service in terms of consistency, quality, and speed of creation.
The problem is, that technology is Blender. The amount of work one requires to gain the necessary skills in Blender is prohibitive. We’d have to dedicate significant proportions our lived to technology that powerful, and it is not easy to learn or use unless you’re willing to put in that amount of time.
And that’s why we came to you.
Having 4o work as a partner alongside us sounds like a dream, given how friendly and helpful it is. As things stand, however, the service is unusable for creative story tellers.
I suppose it’s fine for meme generation and explainer video art, but beyond that, it’s just painful.