DALL-E API Generate images starting from two or multiple images

The desired output is get a series of generated images starting from two images.
Right now I’ve tested DALL-E from the chat and right now the flow is this:

  1. I’ve inserted two images
  2. From that the AI extrapolated every information from those images
  3. Generates 2 new images starting from the information it received

How can I replicate this behaviour through APIs?
From the documentation I see that the variation from the image starts from one and only image.
Thank you

I’d imagine you could send the images to GPT4 via API asking it to describe the image (using a system prompt telling it to only output the description and nothing else), then take that response and send it to the DALLE API.