I was time-constrained and therefore admittedly went with this route, i.e. asking the model to return a detailed description of the original image, which I then used as input for the recreation of the image with DALL-E.
But tried a couple of different options in terms of prompt granularity.
2 Likes