Re Learn and understanding in the field of DALL-E

DALLE is not just a visualization tool.


"I’ve noticed that DALL·E’s technical development seems significantly behind, especially after SORA was launched. Many people might think that DALL·E would become unnecessary after seeing the capabilities of SORA. However, I’ve found evidence to the contrary, which is even more significant considering DALL·E acts like a parameter for SORA. The various anomalies highlighted in the introduction clip of SORA are due to developmental imperfections. We can collaboratively address these issues through DALL·E for several reasons:

  • Language Understanding: According to OpenAI’s research on video generation models as world simulators (Video generation models as world simulators), they apply the re-captioning technique introduced in DALL·E 3 to videos. Currently, DALL·E’s ability to distinguish left from right is lacking. How effectively can SORA operate under these limitations?

  • Real-Time Learning: ChatGPT learns through RLHF from conversations with users. Many have likely noticed an improvement in how ChatGPT provides feedback, as OpenAI is developing RLHF into a more efficient system. This development could almost entirely replace the need for importing training data. The almost yearly update cycle for training data, yet the capability to support new stories, indicates that we can enhance SORA’s language understanding through DALL·E with widespread usage.

  • Spatial and Dimensional Understanding in ChatGPT-DALLE: ChatGPT-DALLE has shown a high level of learning about direction, space, and dimension. Not long ago, I encountered a situation requiring a correction with “I think you show the image on the wrong side. Can you flip it?” where the elements of the new image were a continuation from a previous image, simulating the interior of an L-shaped building. If each image represented each arm of the L and the user stood at the corner of L, the revised image 2 showed the side area outside the window as an open area on the inner side of L, which had not been imaged or commanded to that area yet. It was as if it had an understanding of what the area should be like from the overall picture. I initially wondered why techniques for sensing and understanding changes in dimensions and direction at this level in AI were not used to drive vehicles. Upon discovering the relationship between SORA and DALLE, it indicated the necessity for their application and what the future of AI from OpenAI might hold. However, DALL·E is still confused about left and right, indicating a poor development in interpretation and understanding related to direction, space, and dimensions.

What we can do is either wait for AI to mature on its own or accelerate it by pushing for more relevant applications. As for how to use them, I have prepared them in detail.

For me, DALL·E has never been just a tool for creating beautiful images. The first thing I learned about using it was as a tool for learning and developing interpretation and communication skills from perception through human eyes.

We have been able to develop techniques for using DALL·E that go beyond gen id, seed, noise, which has become knowledge that confines us. On the outside, you can create dozens of images from one message from DALL·E and develop it for various uses, creating an understanding of the resulting images and the extent to which they have been randomly changed. What hasn’t changed is the proper use of gen IDs, the appropriate use of GPTs in learning visualization. We haven’t had much discussion or understanding about these things.