You mean like this
There are a few issues here that I think are worth noting, for people who end up thinking about this.
-
Try as I may GPT-4 (correctly) wants image generation to be “async” and not delay anything in the flow. ChatGPT does not really have mechanisms of dealing with async jobs, so this would be a rather complex change there. In Discourse we have mechanisms for this eg we can create a special:
[generate=image description here]
piece of markdown that handles this… -
Dall-E is far behind midjourney and even some interesting Stable Diffusion models on 2 counts. 1 the deny list of words is oppressive, 2 the actual images it generates is not on par with latest diffusion models.
-
Getting this done right, will take tons of iterations, (regenerate image, fine tuning images, etc…)