GPT4 Images - techniques for determining if prompt requires image or text based response

justin14 · March 20, 2024, 4:33pm

Hi Everyone,

GPT4’s web interface seems to have the capability for users to submit normal chat prompts that return text based responses and image prompts that return images without having to switch models (specifying DALLE). This behavior is great as users do not need to switch models when switching between text or image response requests.

When looking into the API however, it appears that the GPT4 requests can’t return images and that DALLE has to be specified.

Is GPT4 in the web interface returning images meaning the API is just not updated yet OR is if the interface somehow understanding what the user is asking and instead calling DALLE when image prompts are entered?

If the web interface is “switching” models, does anyone know how that technique can be replicated? Command prompts (like /imagine) are obviously easy, but I would like to replicate the smoothness of the web interface if that makes sense.

jr.2509 · March 20, 2024, 6:00pm

Hi and welcome to the Forum!

ChatGPT and the APIs are two separate products.

ChatGPT integrates multiple different capabilities as you rightly pointed out.
In an API context, these capabilities are handled through different endpoints (i.e. chat completion endpoint, image generation endpoint etc.), which need to be called separately depending on the request.

If you are looking to create an interface that replicates the ChatGPT experience, then you’d have to implement a logic that allows you to identify the intent of a user request. Based on the identified intent you would then call the appropriate API in the backend and return the output back to the user.

justin14 · March 20, 2024, 6:20pm

Thank you for that explanation. It completely makes sense. I suspect simply calling on GPT4 first to return the best model, and then submitting the request to the appropriate model would be the most versatile option.

I tested this just now and GPT4 seems to know which model is best to run and returns it to me. Just need to tune it so I always get an expected response.

jr.2509 · March 20, 2024, 6:22pm

Well, you don’t to leave the specific model choice to GPT-4. It’s not “aware” of all the latest models available due to the training cut-off date. So it might hallucinate a response in this regard.

Hence the recommendation to identify the intent and then based on the identified intent, you have a logic implemented on which specific endpoint to call. Perhaps this is what you meant and I just misunderstood.

justin14 · March 20, 2024, 6:51pm

Identifying intent was what I was getting at sorry if that wasn’t clear. If the only other model I need to use a “DALLE” model, GPT seems to do a really good job of recognizing when it should be used (in quick testing). So basically if I instruct GPT to identify intent and limit responses to either “GPT” or “DALLE”, then the intent response can be extracted and used to select the appropriate model to be used in the API.

Essentially the method requires two responses to a users input. One to identify intent (at which point the program selects the appropriate model), and the other to actually respond.

jr.2509 · March 20, 2024, 7:44pm

Yes, as a general logic this makes sense.

Topic		Replies	Views
How to send image and get image as output from GPT-4o model using API API gpt-4	5	316	June 14, 2025
Prompt and Context best practices with API Prompting chatgpt	4	2034	March 8, 2024
How to generate an image and text at the same time by API? Thanks API gpt-4	8	4294	March 31, 2024
Does gpt-4o-mini (via API) supports image inputs? API	6	100	June 11, 2025
DALL-E API to generate json data from image API api	12	4642	December 19, 2023

GPT4 Images - techniques for determining if prompt requires image or text based response

Related topics