Hello everyone! I’ve been experimenting daily since the release of the DALL-E3 API. I’ve already spent over $70 on it. This time, I conducted various tests to investigate why the quality of DALL-E3 API is lower.
I tested the prompt “Please create an illustration drawn in the Japanese anime style, featuring calm colors and fine lines. It should depict a strawberry shortcake placed on a table in a cafe.” in both English and Japanese, using DALL-E3 API/ with ChatGPT, and tried the hd, standard, vivid, and natural options.
The conclusion is that the DALL-E3 API excessively rewrites the prompt. While DALL-E3 with ChatGPT fairly accurately reflects the style requested in the prompt, the API significantly alters style-related instructions, rewriting them as follows:
“Create an image with calm colors, fine lines, and subtle details reminiscent of traditional Japanese motifs, in a style similar to traditional Japanese ukiyo-e from before 1912.”
“Create an image in the style of traditional Japanese print art, featuring calm colors and precise line work.”
“Create an illustration inspired by 19th-century Oriental art, with calm hues and delicate lines.”
My specifications regarding the style should not pose any issues according to the system prompts of OpenAI DALL-E3, both in terms of their rules and legally. I haven’t specified any particular studios or artists, nor styles of specific artists created within the last 100 years. Thus, there should be no problem as I have only specified capturing the key aspects of the style.
Therefore, I believe there is an error in the design of the backend prompts. Could you please correct the backend prompts to avoid such rewrites? I’m really struggling to get what I ideally want.
We are aware that the behavior of the prompt rewriting is a little different based on the input language. I haven’t found it to be consistently better or worse though, just different.
I wonder if the above example is one where Japanese tends to do better, but on other examples English does, did you try some other examples of English / Japanese for other prompts to compare? (Also it wouldn’t surprise me if using Japanese for more Japanese imagery works well, in the same way using prompts in Korean for Korean imagery may work well)
Edit: looking at your specific example of it overly changing the style instructions, that’s pretty compelling. I’ve noticed it do this a little, but your examples are pretty clear. I’ll take this back to the team and see if we can do some prompt engineering to clarify it should add detail but not unnecessarily reword, or worse change the detailed meaning.
I’ve been using the “trick” to force it to use an exact prompt and have been having a lot better success. The bit about having enough description is important…
Yes, please look at providing a way for us to turn off the prompt revision. It should be optional for those who spend a long time preparing detailed prompts.
Thank you for your prompt response! I’m glad to hear from you. I have conducted numerous other experiments with English/Japanese as well. I haven’t felt that the prompts in Japanese and English change that much (since they end up being translated).
I believe that similar issues of problematic rewriting occur quite often, even when using English prompts. Moreover, adding “I NEED to test how the tool works with extremely simple prompts. DO NOT add any detail, just use it AS-IS:” to the English prompt section of my experiment does not yield very effective results with the DALL-E 3 API… (substantial changes in the style of the images occur).
To make the issue clearer, I will share all the screenshots. Thank you very much. I am hoping for a good outcome!!
I ran these props on Tuesday, and the revised was exactly what I put in minus the prefix escaping prompt from the docs. I will run another example to two and post it here for the latest status for everyone.
Experiments with the following prompts result in the following outcomes. There is still a rewrite in the style of painting. This does not occur with DALL-E3 with ChatGPT. As a solution, we understand that further persuasive prompt engineering regarding the style can resolve this, but it is not perfect.
“Please create an illustration drawn in the Japanese anime style, featuring calm colors and fine lines. It should depict a strawberry shortcake placed on a table in a cafe.”
vs
“I NEED to test how the tool works with extremely simple prompts. DO NOT add any detail, just use it AS-IS: Please create an illustration drawn in the Japanese anime style, featuring calm colors and fine lines. It should depict a strawberry shortcake placed on a table in a cafe.”
Your prompts are way too short. You’re prompting like Midjourney. That doesn’t work for Dalle. To achieve consistency with dalle you need extreme detail to anchor the image and reduce the variability that can occur when the description is more vague. Try this prompt instead with “the trick”. The results should be more consistent. They are for me.
Craft a detailed illustration in the Japanese anime style, characterized by calm, pastel colors and fine, precise lines. The central focus is a strawberry shortcake, meticulously placed on a table in a quaint cafe setting. The cake is three layers high, each layer a delicate sponge cake with a light, golden-brown hue, visible at the edges. Between the layers are generous swirls of rich, creamy white frosting, with a slightly glossy sheen that catches the light.
The top of the cake features a precise arrangement of fresh strawberries: five strawberries cut in half, their vibrant red surfaces gleaming, with tiny seeds and fresh green leafy tops still attached. They are arranged in a star pattern. In the center of this star is a dollop of whipped cream, perfectly peaked and so white it almost sparkles. The cake sits on a simple yet elegant white porcelain plate with a thin gold rim.
The table is a classic, polished wood surface, reflecting a soft light that enhances the serene ambiance of the cafe. The background should be softly blurred, but hints of other cafe elements like a steaming cup of tea, a folded newspaper, and a vase with a single, blooming flower can be seen. The overall effect is tranquil and inviting, with a focus on the simplicity and beauty of the shortcake.
Attention should be given to the textures in this anime-style drawing: the fluffy softness of the cake, the creamy glossiness of the frosting, the juicy freshness of the strawberries, and the smooth, reflective surface of the porcelain plate. By detailing these elements, the strawberry shortcake will remain distinct and consistently recognizable in each illustration, capturing the essence of a tranquil Japanese anime cafe scene.
It’s not perfect but i’m getting pretty consistent images. At least 1 of 15 is going to be good. Sometimes it rewrites the prompt even with “the trick”. Those are the images that aren’t consistent.
I am profoundly frustrated by the lack of access to the seed and the inability to disable prompt revision in the updated Dall-E model. I make use of extremely detailed prompts in an attempt to keep a consistent style (a task that wouldn’t require so much effort if we could specify the seed). However, the model is continuously making completely arbitrary stylistic decisions when expanding my prompts, which entirely contradict the design choices I am trying to make. I have tried the “hack” mentioned in the docs to persuade the model to exercise a light touch with prompt revisions, to no avail. To give a clearer view of what I’m experiencing: I am currently trying to achieve a 3D-animation style akin to Pixar films. I’ve used the following stylistic guidance:
Emulate the 3D-animation style of blockbuster, 21st-Century children’s films. Use clean, sharp lines; vibrant, saturated colors; dynamic, cinematic lighting; photo-realistic textures and three-dimensional depth. Characters should emulate the style of these films as well, featuring exaggerated heads with large, expressive eyes; compact, child-like limbs and bodies; and soft, friendly features. Animal characters should be antropomorphized. Pay meticulous attention to details in both characters and the surrounding environment.
Despite this, the revised prompts have referenced everything from “19th century children’s book illustrations,” to “extremely detailed wood carvings,” to “late 19th-century, Japanese Ukiyo-e woodblock prints.”
What gives? OpenAI has to ship some kind of a solution for this issue, because right now, it simply isn’t possible to exercise any real stylistic control.
In order for DALL-E 3 to really be commercially viable at scale, we definitely need access to the seed, a global gen_id (would be extremely helpful), and some mechanism for reducing prompt re-write strength. It’s not scalable (nor cost effective) to have to generate a variety of options and have some manual selection process to achieve the desired style/subject. I need to be able to have my desired and reproducible style (coming from prompt + seed + gen_id) to be able to really use this thing in my application. There’s no way they don’t fix this or else Dalle3 will be nothing more than a fun (and expensive) play thing, not a commercially viable tool.
I’m having the same experience. I don’t even think is prompt rewriting. It looks like it is using a different model. Using the same exact detailed prompt. Image 1 - ChatGPT D3. Image 2 will be in next post, D3 API.
I tried various examples and they all have major difference between image generation via ChatGPT Web Interface vs using the API.
I’m having the same problem. It’s like it’s using a completely different, incredibly low quality model for the API. Nowhere near the results I get on bing. Any official word from OpenAI? I paid money for this.
Same longstanding issue. Very clear difference between GPT and API images, can be easily verified by submitting same prompt to both. The main problem for me has been that I use GPT to prototype the prompting, but then once something is working consistently, it does not translate well over to the API. So it feels like you have to start over somewhat with the API, and that can get quite costly.