How to Enable GPT-4o to Generate Images

Is there any way to enable GPT-4o to generate images? Currently, we are utilizing function calls to achieve image generation, but there are some issues:

  1. Each user interaction requires a function call, even if the user does not want to generate an image.
  2. After generating an image, we need to invoke GPT-4o to identify the image, which wastes tokens and increases user wait time.

Does anyone have any good suggestions or solutions for these problems? How are others handling this situation? Any insights or advice would be greatly appreciated. Thank you all for your assistance!

1 Like

A simple solution could be caching phrases that are being used to invoke the function in a database and on each request you first take all the phrases from the db and look them up in the request.

When a phrase was found you make the call to the gpt with the function.

If no phrase is found you ask a model to identify a phrase that could be an intent of “create an image”, check if it is inside the user prompt and store that in the database (maybe even give it a counter to see how often it was used).

After a couple hundred requests you should have enough phrases to skip that second request.

And when still no phrase is found you call the GPT without the function.

like this…

3 Likes

Hmm or you could even include that in the function - when an image was created you could ask for the intent/phrase that invoked the image creation and store that in the database - that would reduce the costs even more.