How to generate an image and text at the same time by API? Thanks

junyong.you · December 30, 2023, 5:42pm

When using web gpt-4v, for example, we can ask gpt-4v to create an image around a topic and ask it to generate text explanation about the topic. Is it possible to use API to do the same thing? The document says that this is not possible, as we can only use Dall E3 to create an image, and then use GPT-3.5 or 4 to interpret the image. Any comments are highly appreciated.

Diet · December 30, 2023, 7:14pm

Hi!

dalle and gpt4 are two completely separate models (as far as I know). Chatgpt rewrites your prompt and then calls dalle on those new prompts.

You could maybe accomplish something similar by using the assistants, but you might be well served by running your own chains

junyong.you · December 30, 2023, 8:07pm

Hi, thank you. What I meant is to use gpt-4v to generate an image and text at the same time, not meant to use DALL E. What I want to achieve is to create an image and an explanation about the image. With Web app, we can do this by a prompt, e.g., create an image about teacher and interpret the image. I am wondering how we can do it by API? An approach is of course to use DALLE to generate an image and then use GPT-4 to interpret the image, separately. I want to know if it is possible to use GPT-4V to do the two tasks together?

Diet · December 30, 2023, 8:09pm

gpt-4-vision-preview can’t generate images, if that’s what you’re really asking

junyong.you · December 30, 2023, 8:25pm

Okay. So that means, when we are using the web app to create an image and text explanation, it actually calls two methods: Dalle to create an image, and then gpt (3.5 or 4, whatever) to make text explanation?

Diet · December 30, 2023, 8:38pm

To be honest, I don’t know if chatgpt actually calls vision if you ask it to describe what it just generated. it might just describe the text description (the dalle prompt) it generated.

junyong.you · December 30, 2023, 8:42pm

Thanks a lot. It would be great if OpenAI can publish what methods each web call has used. (Sometimes we know it from the web app analysing).

ranger000 · March 31, 2024, 1:48pm

hi, have you figured this out? I have the same question. How can I generate the text explanation about generated images?

supershaneski · March 31, 2024, 11:54pm

The original question can be achieve by function calling, using DALL-E for image creation and GPT-4V for the image analysis.

However, if the image is generated by DALL-E 3, you probably do not need to call GPT-4V if the information written in the revised_prompt from the output is sufficient for your need since it describes what is created already.

For example, you send a prompt: “create an image of a person looking at the cherry blossoms” to chat completions with function calling. Then your function will be invoked and pass the prompt. However, DALL-E 3 will add more information to your prompt when it tries to generate the image. It will then send it back via “revised_prompt” in the output.

If “revised_prompt” is not enough, you can still send the image to GPT-4V for image analysis within the same function code block. Then send the result back to the original chat completion API for summary.

Topic		Replies	Views
Can I prompt GPT to create images with prompt+image API image-generation	5	5834	June 22, 2025
How to send image and get image as output from GPT-4o model using API API gpt-4	5	499	June 14, 2025
How to call apt with input Text+image and return image? API api	2	392	September 13, 2024
GPT4 Images - techniques for determining if prompt requires image or text based response API gpt-4	5	3129	March 20, 2024
Gpt-4-turbo describing the generated images API	0	40	December 22, 2024

How to generate an image and text at the same time by API? Thanks

Related topics