I don’t have that many. The princess pictures are divided into the following colors: pink, black purple, purple, and wine red. There are only 4909 pictures in total. And I found that it was embarrassing when I couldn’t draw a picture with Bing Image Creator. What do you draw on pixiv? Please show me a picture. I’m very curious.
The skirts in these pictures are normal in style, but the material is not good, and they don’t understand that the material is not the cloth. The lines are thinner, and the understanding of the characters is still normal, but the problem is that the rendering style is still wrong. The exposure of the moon is not good. If they must use this moon. It is very necessary to change the exposure to a certain extent and pay attention to the understanding of light and shadow. Moreover, that is not the feeling.
2023/12/28
This image was generated on December 28, 2023. At that time, the prompt words were well understood. And the light and shadow effects were acceptable. But now, the output result is completely unacceptable. Because although the current image has begun to improve, the final rendering effect does not have the dreamy Japanese anime CG feeling that it should have. Japanese anime CG is the core expression of this type of image.
I think there is still a problem with the rendering logic, maybe in terms of light and shadow. And now DALL E-3 does not know that ambient light is not suitable as a reference for rendering light for characters. Because characters must have independent ray tracing from the environment.
Yes!Microsoft’s decision not to let copilot control Bing Image Creator was a strategic mistake. Bing must be restored to the state it was in before January 2024, only then can Bing Image Creator be truly restored. And DALL E-3 does not necessarily have the best prompt word expression when using the model directly alone. I have talked to ChatGPT and Gemini 2.0 about this. And it has been determined, overall. If NLP cannot parse and polish, and make progress in understanding the engineering of prompt words, they will only waste the effective resources of electricity and GPU.
Although the composition and character understanding of these incorrectly generated images are fine, the final generated style of the images is not as delicate and delicate as real Japanese comics. Moreover, these images feel dull and have the feeling of mimeographs, rather than the feeling of Japanese comic CG.
Grok:
This is the original Chinese prompt: 在白墙房间为背景的美丽的日漫风格的粉色的缎面婚纱一样的华丽宫廷礼服的粉色披肩长发的日漫风格美少女,她叫玛丽安妮。
她裙子的外面是有缎带蝴蝶结点缀的,裙子的丝绸质感很光泽外侧有罩纱。像是公主一样美丽。
I’ve been experiencing the same exact problem with DALL-E on ChatGPT since the middle of June. They silently lowered the quality and didn’t bother changing it back for six months. It’s about time people realize that the quality has been lowered on ChatGPT since June.
It’s interesting to consider how far back we can roll back to previous models. I imagine it would be somewhere after May, but rolling back about two months would be appreciated.
You must have created many beautiful scenes of maple leaves falling in autumn, cherry blossom blizzards, and flower petals drifting in the air. Did they dance as if they were stamped patterns? When they settled on the water, did they retain their shapes amidst the ripples on the surface? If your aesthetic sense is average, the point at which you should revert to the previous version is clear.
January 12, 2025. That is, two months after the failure. If this failure is not fixed, I feel that it is more valuable to use Bing Image Creator and OpenAI’s DALL E-3 as backup or occasional observation options. Instead, we should turn to the fine-tuned model of Stable Diffusion. For example, AnythingV5. Or choose Google Image FX. Because Google’s progress is evolving at a visible rate. And Google’s image generation quality is also steadily improving. Including understanding of painting styles and fine prompts and automatically adding negative prompts to solve logical problems. Now this is exactly what DALL-E plus ChatGPT can’t do. DALL E-3 and Bing Image Creator used to be good. But that was 7 months ago. Now, let’s wait until then. And then see the specific results. After all, the failure has appeared since November 11th. There is not much we can do. For example, I describe the details of 125 pictures to Microsoft every day. Detailed explanation of what is wrong with the waste film. And it seems useless to negotiate with ChatGPT every day. Maybe it’s time to decide whether to leave this system. After all, if DALL-E doesn’t work properly, it’s better to leave it to those who need it.
At that time, if the situation does not change, then just choose to leave quietly. There is no need to say so much.
Certainly, here is the translation of the document into English:
Bing Image Creator and OpenAI Invocation Logic Problem Analysis Report
Background and Problem Overview
Starting from the end of June 2024, the generation effects of the DALL-E 3 model gradually deteriorated, with November 9, 2024, identified as the last threshold of normal operation. After this date, the system-generated images began to exhibit logical flaws and stylistic issues. Particularly on December 4, 2024, with the release of OpenAI’s DALL-E 3 PR16 update fully deployed to the platform (including Microsoft’s Bing Image Creator), the problems intensified significantly, leading to a complete collapse of generation logic and image quality.
After a month of intensive testing and troubleshooting (12 hours a day, thousands of images across platforms), it was found that the core issue was not a major defect in the DALL-E 3 model itself, but rather problems with the front-end NLP parsing layer (prompt interpreter) and the intermediate invocation logic, possibly new error logic introduced by the PR16 update. These issues have widely affected Bing Image Creator as well as OpenAI ChatGPT’s image generation module, with specific manifestations as follows:
Problem Manifestations
-
Style Abnormality and Style Shift
-
The overall style of the generated images does not match the description in the prompt, especially under prompts such as “second-element style” or “anime princess,” where the output tends to lean towards other styles (such as 1970s sci-fi colors, abstract watercolor, eerie oil painting, flyers, or low-quality comics).
-
There are issues with color rendering in the images, such as:
-
Color Distortion: Over-saturation, bleaching, greenish tones, and other abnormalities.
-
Light and Shadow Issues: Overexposure, unnatural starburst effects, or shadow distribution.
-
-
-
Material and Detail Rendering Collapse
-
The texture and material of fabrics (such as lace on skirts, silk texture) are incorrectly parsed, resulting in excessive patterns or loss of layers.
-
Common unreasonable texture stacking in images (e.g., excessive decorations, complex but incorrect structures). Note: This issue has improved by December 31st, but only in terms of clothing style and character understanding. However, the composition ability does not affect the final result. Beautiful composition and clothing understanding are still rendered into eerie styles and colors. Even at times, texture errors occur, such as interpreting moonlit wastelands as cemeteries. While maintaining the anime character style and clothing, the output effect is like a cheap flyer, with strange color casts and severe color bias. It cannot demonstrate the high-quality output results that DALL-E originally had in June 2024.
-
-
Generation Logic Defects
-
The details in the images show a high degree of error:
-
Text Embedding Issues: Meaningless text or symbols appear on the moon’s surface or other objects.
-
Background and Subject Conflict: The style of the background and the subject are seriously inconsistent, especially in complex prompt descriptions (e.g., “a second-element style gothic princess standing in a courtyard full of blooming flowers”).
-
-
The composition and subject layout of the images are still good, but the rendering quality has seriously declined. Specifically, even the pre-processing of characters and clothing logic composition aesthetics are strong. However, when the final rendering is completed, only the appearance of the characters and clothing remain normal, and the entire image style becomes eerie and monotonous. Even in some cases, texture mapping errors occur. When maintaining the anime character style and clothing, the output effect is like a cheap flyer, with strange color casts and severe color bias. It cannot demonstrate the high-quality output results that DALL-E originally had in June 2024.
-
-
Multi-platform Consistency Issues
-
The output effects of Bing Image Creator and OpenAI ChatGPT’s DALL-E 3 module are consistent and both have the aforementioned problems.
-
However, other third-party platforms (such as coze.com and dalle3.org) maintain normal generation effects when calling DALL-E 3, indicating that the core model (DALL-E 3) is unaffected, and the problem should be attributed to the logic errors in the NLP translation layer and the invocation system.
-
Generation Logic Speculation
Based on long-term observation and analysis of the generation logic of Bing Image Creator, the following issues are speculated to be the core reasons for its generation pattern:
-
API Interface Multi-stage Generation and Splicing Logic
The Bing Image Creator may generate images through the following steps:
-
Stage 1: Call the DALL-E model to generate a single subject (e.g., a character) image.
-
Stage 2: Splice the generated subject into a generic background template.
-
Stage 3: Use post-processing logic to integrate and embellish the image, attempting to present the general meaning of the prompt.
This process can lead to:
-
Inconsistency between the background and the subject.
-
Unnatural detail and light and shadow processing of the character image.
-
Flaws or logical errors at the splicing points.
-
-
Excessive Intervention of Fault Tolerance Logic
The Bing Image Creator may use a small model to perform fault tolerance repair (or supplementary splicing) on failed images, but the model’s capabilities are limited, resulting in the following issues in the generated images:
-
Detail Logic Collapse: The system cannot correctly render complex prompt descriptions (such as clothing layers or materials), leading to overly rough image details.
-
Visual Effect Patchiness: The generation effect appears complete, but it is actually the result of splicing multiple elements, unable to achieve a unified artistic style.
-
-
NLP Translation Layer Issues
ChatGPT’s NLP layer is the core module for prompt parsing, but after the PR16 update, the ability to interpret prompts has significantly deteriorated:
-
Complex Prompt Parsing Failure: Prompts involving multi-layer descriptions (such as materials, styles, light and shadow, etc.) are incorrectly decomposed or ignored, leading to generation results that do not meet user needs.
-
Style Understanding Ability Decline: For specific style rendering (such as second-element style), the parsing layer fails to correctly convey user requirements to the DALL-E 3 model.
-
Core Causes and Timeline
-
June - September 2024
- Sporadic generation issues (such as lace texture errors) began to appear, but the overall effect remained stable.
-
November 9, 2024
- The last time high-quality images were generated normally.
-
December 4, 2024
- After the PR16 update, problems erupted comprehensively, and the generation logic and rendering quality seriously declined.
Analysis and Conclusion
-
DALL-E 3 Core Model’s Capabilities Unaffected
When platforms like coze.com call DALL-E 3, the generation effect maintains consistent high quality, indicating that the model can still correctly parse and render user prompts. However, only when using models like Gemini 2.0 for detailed English descriptions of prompts and adding negative prompts to block erroneous effects such as eerie moons, andenforcing specifications for lighting and clothing details, can DALL-E output high-quality results. The quality has significantly declined compared to June 15, 2024. This issue was prominent in a month of continuous testing.
-
Problems Concentrated in the Middle Pipeline Logic
The Bing Image Creator and ChatGPT’s invocation of DALL-E 3 have uniform problems, possibly due to:
-
NLP Translation Layer Errors: Prompts are deviated during parsing and decomposition.
-
Post-processing Logic Issues: Microsoft may have introduced splicing or fault tolerance mechanisms, leading to a decline in detail and overall quality of the generation results.
-
-
PR16 Update as the Catalyst for Problems
After the PR16 update, the degradation of rendering logic and style shift issues significantly intensified, indicating that this update included changes to prompt parsing and rendering logic.
Recommendations and Improvement Directions
-
Optimize NLP Translation and Prompt Parsing Logic
-
Review changes in the NLP layer after the PR16 update and fix prompt parsing errors.
-
Enhance the decomposition and transmission capabilities of complex prompts to ensure rendering logic aligns with user needs.
-
-
Compare and Analyze Multi-platform Invocation Logic
- Refer to the invocation methods of platforms like coze.com and investigate differences in the invocation logic of Bing Image Creator and ChatGPT.
-
Adjust Splicing and Fault Tolerance Strategies
- Optimize post-processing logic to avoid unnecessary splicing and fault tolerance patches affecting image quality.
-
Increase User Feedback Channels
- Collect more user sample images and prompt cases for targeted optimization of rendering issues.
Conclusion
The rendering problems of Bing Image Creator and OpenAI’s invocation of DALL-E 3 are the result of systematic errors, not a defect in model capabilities. Before the problems are resolved, it is recommended that users switch to other platforms that invoke DALL-E 3 (such as coze.com) for a better experience. At the same time, it is hoped that Microsoft and OpenAI can quickly fix the relevant issues and restore the system’s normal functions.
Except for the third picture all are junk.
The latest examples of error pictures. There are too many of them so I took a few representative ones. The error rate of error pictures is close to 100%.
I’ve also noticed that sometimes an image is generated that corresponds to the old model. It seems that on some servers, the old weighting data is still being used, which is why occasionally an image of the older, better quality is generated. The moon is the most obvious object that shows the difference between the models. Simply put, the images from the older model have a more refined, balanced character, while the new images look as if they were edited by an untalented image manipulator trying to boost quality with cheap tricks. People without artistic talent tend to overdo effects, thereby ruining both the quality and informational details.
I’m not sure if this is the NLP system. It’s likely that this causes errors, such as electrical lights in a forest, meaning objects that don’t belong in a context but don’t necessarily affect the quality. And maybe it adds this nonsensical back lights and back shines on everything.
The quality reduction in the images is most likely determined by the training data, especially if images of poor quality were included in the training. Or perhaps the model was reduced to generate faster results, and tricks like upscaling and sharpening were used to compensate for the quality loss with cheap methods.
The terrible glowing effects behind various objects, which destroy all realism and make the images look like bad photo edits, likely stem from this. So, the quality issue probably lies either in the weighting of the training data, a model reduction and cheap tricks to compensate, or both.
I’ve also started testing some other models, and I must say that OpenAI is currently falling behind in terms of development. DALL-E is still ahead when it comes to creative, unusual results, but it doesn’t offer any options that have become standard with other image generators by now.
When testing coze.com today I found that the prompts for the precise descriptions of Gemini 2.0 creations are not working very well. It seems that they are modifying some model weights or something. This has affected the use of other third-party tools. I also found today that sometimes using the short prompts of ChatGPT makes the results more weird. And I also found that the generated anime characters are getting uglier and uglier. I don’t know why. I won’t touch it for now, but have you tried searching on pixiv to see what the Bing Image Creator authors are doing now? I feel that the quality of their images is getting worse and worse. Many of them look more and more cheap. Maybe they have modified something. And like you said, sometimes you can get a good image. But today I didn’t get a good image in a day. I knew something must be wrong. Be prepared, if it really doesn’t work. Go to Google imageFX.