Why the Quality of DALL-E3 API is Significantly Lower Compared to the Original

@yk.kazuyuki thanks for opening this thread and investing so much time and money, i was hunting down the same issue and used this thread here as guidance. I ended up with a fake ChatGPT+ mode, so to speak. I implemented a combination in wherein i let GPT4 prompt the DALL-E 3 API, to generate a very detailed 500-600 word description. What i noticed is that the DALL-E 3 API seems to love longer descriptions but at the same time skip over details if they are mentioned just once. To get important details in, they need to be sometimes hammered down by mentioning them more then once, it’s a bit annoying.

However, i compiled all suggestions here and my GPT4 + DALL-E 3 approach into an open-source app to enable quick testing of prompts and see the results (modifications) also with the tipps mentioned here, the app is up on GitHub under the name ‘simple-dalle-ui’.

3 Likes

I have been using Dall-E3 for months, and I have noticed a considerably decrease of its quality and complexity of images, even when using same prompts. For example, the first image I created back in October, while since late December I have not been able to do similar quality images - adding two I have tried to create today.

This has been such a pitty, quality of images overall has been considerably low, and even when trying around other promts, approached, nothing work - it seems it is an structural issue that prevent it from doing same quality work as before.

Is this something that OpenAI has recognized, the lower quality/complexity of outputs.

I belive it is exactly the opposite: Massive prompt rewriting happens on the chatGPT/Dall-E3 side when using the web interface. Answer from chatGPT w.r.t. the lower quality / style deviation when using DALL-E 3 with API:

“When using DALL-E via its API, the prompt preparation process is generally up to the user or the system integrating with the API. The API itself does not modify or elaborate on the prompts; it generates images based on the exact text input it receives. This means that for API usage, the responsibility of crafting a detailed, effective prompt falls on the developer or the end-user who is making the API call.”

When asked for the system prompt chatGPT used for image creation via the web interface, it provides a system prompt that massively deviates from the original input. Thus, optimization takes place that is invisible to the user. Even the seed information doesn’t make the API results way better.

I tried to send the prompts intended to be used with DALL-E3 API in a preparational step to chatGPT-4, ask for optimization specifically for DALL-E 3 via API. It gets better, but a lot of unexpected style and content deviation happens anyway. It’s a massive loss of money and compute power for 25-30% accuracy …

1 Like

Suggestion: try using Typing Mind with an API. If you work with a Mac, you can get it through Setapp.

Just in case you were wondering, the above is a blatant mistruth by ChatGPT.

There is an AI between you and the API DALL-E 3 that does similar filtering of content and rewriting.

DALL-E 3 wide images are $0.08 each. A $20 monthly API bill of those is 250, eight images a day, while ChatGPT Plus at $20/mo lets you triple that or more, and still ask other questions to GPT-4. So OpenAI’s product competes with and outprices their own developers.

Your choice is really dependent on how much you would actually use image generation personally.

That “Typing Mind” costs $59 if you want to generate images, with your own damn API key. You should NEVER provide your API key to unknown third-parties.

Use Python. I gave an elaborate script that only takes a bit of understanding of the input variables you’d modify in this topic:

1 Like

Thank you for clarifying the hallucination of chatGPT when it comes to prompt rewriting. As i have to create between 50-100 images per task run, i use a Python script for accessing the API and to create the images and store them. Anyway, the quality of the images is way better on the web frontend then through the API. That’s unfortunately a fact for my setup. With this amount of images you can’t do it manually on the webfrontend …

And i tried the various parameter options in my scripts as well. The most important feature - for a production system - to keep the same style and quality was the seed nr, which was was removed from the API. So i am required to constantly play around with the prompt adjustments and optimization with chatGPT (which is prone to errors, as the prompt gets rewritten or “seasoned” anyway). Whatever the difference in the prompt rewriting technology in one or the other case is.

1 Like

I have the same problem, the quality of images generated from api directly is simply much much worse than the one from bing image generator too. Even when I take the revised prompt and then feed it to bing! It is almost as if openai has put a limit on dalle through api, to encourage user subscription to chatgpt+.

EDIT: I am not yet sure if this is 100% correct, but from what I can observe it seems to be a bug with style=“natural”. DALLE3 API poor quality even considering prompt revision - #4 by 1004wsh Which style did you use?