DALLE-3 ON API - Seed support

The Control Is All We Need

The primary challenge with image generative models lies in exercising control over the outcomes. For creating beautiful, but somewhat random art, Midjourney already serves this purpose.
DALLE 3 marked a significant breakthrough in prompt comprehension, thereby enhancing our control over the results tenfold.
Although it cannot be fully controlled yet, any means of augmenting this control is invaluable.
Using seeds is one such tool that can drastically reduce the number of ‘wasted’ images generated in the process to obtain the desired results.

Try this prompt below or something like this. In order to achieve consistency without a seed, you need to add more detail to your prompt to reduce noise.

Create a compelling Japanese anime-style illustration with a focus on dramatic lighting and crisp, fluid lines. The scene is set in an ancient dungeon, dimly lit with a blue, mystical glow that adds an air of tension and suspense.

At the heart of the dungeon is a magnificent beast, grand and intimidating. It should be depicted with a muscular build and its fur predominantly white, accented with blue stripes or patterns that resonate with its glowing blue eyes. The eyes are to be drawn with a luminous quality, suggesting a magical or supernatural power, matched by a radiant blue mark on its forehead.

The creature is adorned with elaborate armor-like decorations that give it a regal and formidable appearance. Its claws and teeth are sharp and fearsome, gleaming with a hint of the same eerie blue light. Its ears are alert and its posture should convey readiness for battle, while its long, bushy tail creates an imposing silhouette.

The dungeon’s architecture is composed of aged stone, conveying a history of ancient and mystical events, with the occasional relic or artifact adding depth to the background. Puddles on the floor reflect the beast’s glowing presence, enhancing the dynamic and enigmatic quality of the illustration.

This image should capture the essence of an anime-style encounter, highlighting the detailed and majestic beast in a setting that complements its mystical and powerful aura. The careful portrayal of the beast’s features and the dungeon’s ambiance is crucial for an authentic and consistent anime representation

1 Like

I have custom instructions set that once I lock onto an image I like, to keep that seed for the rest of the conversation, unless I say otherwise.

That way when I make small changes… it (hopefully) doesn’t completely change the image. Let’s say I’m trying to have a storyline going and want to keep it somewhat consistent.

But now, I can’t do that. Unless I want to write an essay for each and every image prompt, I have zero consistency.

The new gen id’s don’t really work and are clunky. Change the color of 1 person’s shirt in a setting? Now it randomizes the whole setting. Different style, time period, location, number of people, ethnicities, etc. etc.

Before I could basically random find a scene I liked, then tweak it with small changes to get it just right.

Now that will require a small novel.

I hope whatever reason you guys made this change for, it was worth it.

You are partially correct, especially when it comes to real photographs or AAA game scene.

However, there are a couple of issues to consider:

  1. Many art styles (are not photographs) cannot be accurately conveyed in words unless they are already well-known and popular, such as the DC comic style.

  2. Even if you try to provide a detailed description, you are still constrained by 256 tokens.

  3. Lack of diversity. In your example, consistency is indeed maintained to a certain extent, but it also means that anyone who knows your prompt can create images similar to your style. IMO, this possibility is very high.

This is why seeds are crucial because they can convey the artistic styles we desire, but cannot be described in words.

The beast image you generated may be look better than mine, but what if I just want the style of my beast image?

Also, seed can save tokens to some extent because it contains a lot of predefined information.

@owencmoore I have added more use cases for seed.

I agree with you that they should reintroduce the seed but for different reasons than what you’re saying. Having a seed is crucial for scaling any business, especially at a commercial level. Without it, only those with substantial financial resources can maintain consistency. Consider creating a 40 page graphic novel: achieving character consistency is def possible but probably financially out of reach for most without a seed. By removing the seed, OpenAI seems to be favoring businesses that can afford to extensively use the API for desired results, overlooking individual artists who can’t afford such high-frequency usage. It’s possible that this is an actual business strategy, as it would be more profitable for OpenAI. But it alienates smaller creators who lack the resources to generate numerous high-definition images rapidly…so they’re just going to flock back to Midjourney.

Regarding style consistency, it can be achieved in any genre in my experience. I would recommend to move beyond referencing specific styles, like ‘Pixar style’, for commercial projects though. Dalle 3 can rewrite these references, but delving into the origins of these styles will yield even better outcomes than simply naming a style. Like dig into where Pixar style actually comes from? It’s pretty interesting actually. Also if you’re incorporating a particular artist’s style in your prompts, you should shift away from this practice unless you have the legal rights to use those styles. Eventually Midjourney will have to moderate styles as well but they don’t have an api yet so their product isn’t really commercially viable for businesses ( so it’s currently a minor concern is my understanding ). Even if you build your own api via Discord Bot ( i actually did this ) max 12 concurrent images so it just doesn’t scale. You have to set up a queue and it’s just a nightmare to manage for more than a couple users. Vs a queue with Dalle 3 works like a charm.

As for the security of prompts, I haven’t seen any AI tool capable of extracting or closely replicating my prompts. Tools like GPT Vision or Midjourney /describe might guide you in the right direction, but their outputs aren’t directly usable. You still have to put in the hard work. It’s important to protect your prompts though and not expose them to the front-end of your apps, encrypting them is not a bad idea.

I have not experienced issues with diversity but I do understand what you’re saying and it’s a valid point.

I have not tried this, but could you not add a seed in the prompt? given for each seed the prompt changes it would become a different image, including you could add some things that make the image variable and infinite like running a script that autopics colors in and then each time you give it the prompt a color is changed and thus the image, Dalle seems to repond fairly well to minor changes

I have read through many of these issues and test cases, have experimented as many have here via Dalle 3 UI, and via API. What I have found is what many have found here: That even despite very specific prompts, the current control, especially consistency of characters e.g. that might be used for an illustrated book , are difficult at best. Nearly impossible at worst.

For example, when populating one character within a scene, the size of the character changed from small to large - so no control on basic morphology (which could be a sub description - think class object structure with more precise object attributes and sub-attributes).

Back to character generation, the greater the number of characters in the scene, the more “confused” the generator became. So, a shirt defined for one character would suddenly be found on another, or both.

Number of characters. The generator would populate with extraneous character. I refer to it at “generative mitosis”. In addition, where characters were supposed to include only animals, the generator populated scenes (despite “no” or “absolutely no” statements in the prompt. This is especially true if you have animal characters in a story. The instances of anthromorphic images and human images inserted into the images is a constant and infects at least 80% of the images produced. This includes wall art that has been placed in the images.

As mentioned earlier, if there was an object-oriented paradigm/model that could be “understood” by the DALLE engine (or other LLMs for that matter), developers could define and modify these at will, resulting in much greater control over the images generated.

For the record, I have produced a python class object structure that I use for greater control of prompt generation. Combine this with a DALLE interface - we might really have something.

Finally, with respect to seeds, if the seed could be a wrapper around the “embedded” object model, it could also be secured (think an API within an API) that could be accessed via authentication and the necessary supporting encryption to allow external, dynamic modification of instances that are used to define derivative definitions used to generate the desired target images.

Please provide your thoughts here or if needed, feel free to contact me at william.collins@alkemietechnologies.com

1 Like

Having seed support would make image generation via API (Or chat) useful for illustrative purposes on blog posts or other media in which a consistent image style is required - which is often the case. Without the seed parameter, even with detailed prompting, the images are simply varying too much in terms of color, composition and general style to provide a consistent output that can be used professionally. At best, the Dall E service currently requires tons of effort to meticulously define a long and extremely detailed prompt, that is then very difficult to transfer if one wants a image in the exact same styling, color etc., but with modified content. After a period where this was available in the ChatGPT I now see no other option but to return to Midjourney.

That’s simply not accurate. While a seed or image reference would be extremely valuable and I would love that to bring even more consistency and complexity, I’m able to produce 50 page books every minute via api in one go with extremely consistent characters and clothing. Any ethnicity or gender works. It hot swaps while keeping everything else in the image the same. No editing, no inpainting, no seeds, no nada needed. The key lies in your prompts. Keep trying and you’ll figure it out.

3 Likes

Hi there! Thank you for your contributions. I’m curious if you were able to attain this level of consistency out of the box via the API and clever, careful prompt engineering or if you had to do some “culling” of a larger image set to select only those that were consistent. Even with detailed, descriptive, consistent character descriptions, I’m still struggling to attain this level of consistency via the API.

Likewise, have you had any luck with multi-character scenes? This is where things still really break down for me in terms of consistency.

1 Like

Hello. Can we now choose SEED? Do we have this freedom? If anyone knows how to do it, I would appreciate it if you could share it. Thanks

The key lies in your prompts. Keep trying and you’ll figure it out.

Could you share an example of a prompt you might use?

Could you share you progress? @alexh I really need you tips.

Would like to +1 a wish for a seed parameter with DALL-E 3’s API. It is actually quite interesting how often the same exact prompt structure (which I carefully crafted after testing) can be different on ChatGPT (when I finally get it to solely use the prompt I provide) and with the DALL-E 3 API. Even with the same character lengths of the prompts, asking for it to generate something in a certain style can be inconsistent more often than not.

There’s a reddit post on consistent characters in DALL-E 3. This can help you narrow down some aspects of character style using your prompt, but it’s still not always consistent still, and I think a seed parameter would address that.

For a production level application, I wonder whether its a matter of a technological leap or bringing scale to higher power models that would eventually address the overall problem, if its not a seed?

3 Likes

A seed just makes sense. Why are we going against the grain?

3 Likes

I agree with all the people that posted here and I urgently need the seed support in the api.
In some use cases you just can’t be that specific with your prompts and still need to quickly find a fitting image and adjust it minimally without trying multiple times…

1 Like

Agree, i would love a seed as it would help significantly with clothing and facial consistency and ultra-complex scenes. But you don’t actually need a seed for a lot of different images and styles. I created all these images below via the api in one attempt in under a minute just now. I’m doing it programmatically in terms of adjusting the prompt for each scene from a movie script I wrote but the characters are basically the same. No seed. no reference ids. And that took me about 40 seconds via the api. I’m indeed very interested to see if a seed actually improves things. You would definitely need a seed or reference image / id though if you were trying to get your characters to be more personalized and look like yourself for example. Or dalle would need to be able to handle much longer prompts so you could describe characters in hyper-detail.

Looks nice. Although look at the right image in the second row. A symptom I described in October:

Thanks. Actually tuxedo cat is not in any of my prompts. I specifically wanted him wearing a tuxedo in that image. But you’re right, if you say tuxedo cat sometimes it randomly hallucinates a tuxedo. I got around that by just describing the fur of the cat instead of using the word tuxedo.

1 Like