A few Dall-E-3 feature requests

  1. raw prompts

It would be nice if we could tag a request to mean “send this prompt to the image generator unaltered” (after safety checks).

Primary use case is it allows you to take the prompt generated for a previous image and tweak it without the AI rewriting it. Could be extra valuable if you add seed support.

  1. split prompt expansion from image generation

This is the other half of #1: allow us to get the expanded prompt without ever actually generating an image from it.

Use case: I’ve had a problem where I generated a series of images that were supposed to be photographs, but the txt2txt AI removed the word photograph (e.g. “color photograph” became “colorful image”) and the txt2img interpreted this as allowing an illustration. In the space of 15 image generations I got 12 photograph ones and 3 cartoony illustrations.

If I can intervene between the txt2txt and the txt2img, I could programmatically re-add basic requirements like “color photograph” if they got removed. This COULD be done by giving me some kind of mixed raw prompt & expanding prompt (e.g. pass in two prompts, one which is raw and is prepended/appended to the expanded prompt), but I think preferable is just allowing me to generate the prompt without generating the image, and then generate the image from a raw prompt, so that I can do anything I want in between.

Of course OpenAI could try to address the specific use case I mentioned differently, e.g. improve the txt2txt AI piecemeal to address the specific case I ran into, or, since AIs are difficult to control, just write your own postprocessor that adds in key stylistic things like “photograph” if the txt2txt rewrite lost them, but I think exposing in the API is more useful.

  1. PNG output

I’m hoping this is just a slow rollout given Dall-E-2 outputs PNGs, but just in case:

Currently ChatGPT will let you download PNGs for Dall-E-3, but the API sends WebP. If I’m paying $0.12 for a 1792x1080 image, I think you can send it as a PNG.

2 Likes

This is available… See here…

1 Like

I suspect #2 can be achieved by just sending chat completition requests to the GPT-4 model asking it to write a prompt, but some guidance on how exactly to do this to get similar results to the Dall-E-3 prompt rewriter would be nice. For example, what’s a good system pre-prompt?