How to call apt with input Text+image and return image?

I need to input Text and images to get a image (Text like “create a image according to this images…”), gpt4 interface can meet my need,but I don’t know how to realize it with api, except to descibe images with texts. Can any one help me ?

were you ever able to solve this? thanks

This is not currently a feature one single API endpoint can perform.

You could get an LLM endpoint to generate a prompt for the dall-e endpoint though. Although this would not contain and specific likenesses of anything in the image, just generic descriptions of those things,