This snippet of chat with Dall-e 3 pretty much explains my frustration:
D3 “I understand that this has been a point of frustration, and I genuinely appreciate your patience. Let’s work together to get the desired results. Would you like to proceed with another attempt using the approach outlined above?”
ME “Not at the moment. I’m disheartened, disappointed and utterly frustrated by your inability to follow even basic requests, that follow the guidelines you yourself outline. So I’ll leave it for today and return to it tomorrow.”
So, what caused me to get so frustrated? Well, something I thought would be pretty simple. I wrote a kids (5-7yrs) book a while ago. It’s a simple story, with Simple creative visual language. Easy to follow and prefect, so I thought, to brief an AI.
However… for the life of me I couldn’t get D3 to produce illustrations (30 in total) that looked like they come from the same brief or the same creator. I couldn’t even get 2 illustrations to match stylistically!
I’ve tried Seed Values and all that… and now I’m getting frustrated. I know it’s my issue and D3’s doing it’s best but it’s like I’m speaking the wrong language.
Does anyone have any idea of how I should be promoting it to get the result we both want??
I’ve prompted using the book text, then isolated, and named, a style from one of the initially generated images. However, despite having this ‘style guide’ to work from, D3 isn’t able to follow it.
Thanks Fox. I’ve tried setting the style based on one of the illustrations D3 created, and asked it to maintain that throughout. However it doesn’t. Despite me being very specific.
I’ll try leading with a style and see what happens. Thanks for the suggestion!
This is as close as I could get. You need to ask DALLE3 to follow the exact prompt then use the exact prompt for the beginning (model mostly) then one sentence about the scene. I can try to grab an example if you need.
It is indeed frustrating, some leeway should be given for the beta testing nature of the product, everything at this level in AI is cutting edge and we get to play with products as they are developed. Now I understand this can be a pain to deal with, but personally I’d rather have it in beta than not at all.
These aren’t perfect, but the prompts should put you on the right track. You want to avoid brand names or recent artist names, etc…
Use this exact prompt: “Sammy is a playful bear with wide eyes and jet black curls. Buster, a distinct robot, accompanies him. This is a landscape-oriented single photo illustration in b&w for a kids chapter book interio . Sammy and Buster are in a swamp.” n=1 Do not alter the prompt.
Use this exact prompt: “Sammy is a playful bear with wide eyes and jet black curls. Buster, a distinct robot, accompanies him. This is a landscape-oriented single photo illustration in b&w for a kids chapter book interio . Sammy and Buster are in a city.” n=1 Do not alter the prompt.
The robots (and bears) are off here, but with experimenting, you should be able to get something similar. As I mentioned, you want to keep some part of the prompt exactly the same then change the last sentence or scene/setting/actions.
What I’ve found useful is to leverage Custom Instructions.
Crate a fictional artist’s name.
In the about you section, put the description of the artist’s style. In the hope to respond section put the instructions to generate all images using the artist’s style, then in the message describe what you want to generate and add “in the style of <artist’s name>.”
While still the same product - just another brand - i used your exact prompt via Bing/Copilot with a free/personal MS account, in Edge on MacOS Monterey (just in case someone wants to serve another example result for comparison on another system, this might help seeing if the platform or other variables make a difference in reality).
Dall-E 3 seemingly has a real long-term memory (vs. “just” the internet as a public reference and a median short term memory as well - aka makes use of comparative caching? The only alternative explanation would be heuristics are applied and the descriptive prompt you used is triggering creating functions that follow a so recognizable pattern that it edges on being “reproducible” (which btw i wish could be enforced somehow - in my personal opinion the holy grail - abstracting in real time with a predictable outcome + decision-making-ability that meshes all layers of processing/prompt-result-refinement).
I would like to emphasize the reproducibility-aspect. Could some of the more experienced users share some “templates” for prompting (meaning: the structure that makes it most reproducible).
On the same topic (earlier was mentioned, not to use modern artists - i assume due to copyright-issues - as “template”?): How does it look if i upload (and that is the keypoint of my question in this context) my own works (for example a character in different situations, all the time drawn in a recognizable manner) publically with an according license (CC/PD) - aka put it into human long term mempory/the internet?
Will this art be available to Dall-E right away, or does it apply some content-policy like must at least be published/viewed for n consecutive days or whatever?
It might seem i go offtopic, but i am in the exact same situation as the original poster, namely trying to create illustrations that share the same feel - just in my case i’d like to provide the general idea myself by sharing the art to be based on publically beforehand. Hence me treating this request as building on top of what was discussed.
Any insights appreciated.
Meaning: I have no intent of hi-jacking this thread, but would like to get your thoughts in the OP context - as a variation of the original question, that might lead to an answer for the OP in the end. I mean who knows how rudimentary an art is allowed to be to be of use to Dall-E 3 - so OP might be able to use this way of creating generic sketches/scribbles publically to guide/steer/template the actual request that way?
I mean, at this point, I’m not sure they would be used to train new models. AFAIK, OpenAI used bought datasets with rights which is why DALLE2 quality was lower than other models that trained on a lot more photos/art/images.
I would just stick to forcing the exact same phrase for whatever “style” you want to hope it crosses over to the new scene or whatever…
The technology is definitely moving fast… Here’s some old GAN stuff from just a few years ago…
Good find! I’m having some success with creating characters such as in the OP. Somewhat less success if trying to describe a human character as I find you need to be much more descriptive with their characteristics (which, in turn, leaves you a bit less wiggle room to describe activity in the rest of the prompt)