How to call apt with input Text+image and return image?

I need to input Text and images to get a image (Text like “create a image according to this images…”), gpt4 interface can meet my need,but I don’t know how to realize it with api, except to descibe images with texts. Can any one help me ?