I have read through many of these issues and test cases, have experimented as many have here via Dalle 3 UI, and via API. What I have found is what many have found here: That even despite very specific prompts, the current control, especially consistency of characters e.g. that might be used for an illustrated book , are difficult at best. Nearly impossible at worst.
For example, when populating one character within a scene, the size of the character changed from small to large - so no control on basic morphology (which could be a sub description - think class object structure with more precise object attributes and sub-attributes).
Back to character generation, the greater the number of characters in the scene, the more âconfusedâ the generator became. So, a shirt defined for one character would suddenly be found on another, or both.
Number of characters. The generator would populate with extraneous character. I refer to it at âgenerative mitosisâ. In addition, where characters were supposed to include only animals, the generator populated scenes (despite ânoâ or âabsolutely noâ statements in the prompt. This is especially true if you have animal characters in a story. The instances of anthromorphic images and human images inserted into the images is a constant and infects at least 80% of the images produced. This includes wall art that has been placed in the images.
As mentioned earlier, if there was an object-oriented paradigm/model that could be âunderstoodâ by the DALLE engine (or other LLMs for that matter), developers could define and modify these at will, resulting in much greater control over the images generated.
For the record, I have produced a python class object structure that I use for greater control of prompt generation. Combine this with a DALLE interface - we might really have something.
Finally, with respect to seeds, if the seed could be a wrapper around the âembeddedâ object model, it could also be secured (think an API within an API) that could be accessed via authentication and the necessary supporting encryption to allow external, dynamic modification of instances that are used to define derivative definitions used to generate the desired target images.
Please provide your thoughts here or if needed, feel free to contact me at william.collins@alkemietechnologies.com