Agents that can generate images

I’m using the Agents SDK and was wondering how I could get an agent to generate an image. You can’t set the model of an agent to an image gen model like dall-e-3 right? So I need to make a function tool like:

@function_tool
async def generate_image(prompt: str) -> str:
  # do image gen here using something like openai.image.generate
  return image_url

and then have the Agent use that tool. And would I also have a custom output class like:

class ImageResult(BaseModel):
    url: str

and then output_type=ImageResult? Would I need to nudge the agent in the instructions to return the image url to the user?

You could be supplying an image URL link (one of your own, not OpenAI’s expiring link of dall-e-3 URL response) and hope the AI repeats it successfully, or makes a useful markdown link.

Better would be to seamlessly display those images in a user interface, and have the interactivity there.

Functions provide a service to the AI; it must find them useful for a task. “make_AI_images(prompt)” - pretty easy to understand the utility.

The function cannot return images. You would just return “1 image successfully generated and displayed for user” or a message that produces the needed AI language.

“Make an Image” is usually the end of any agentic path. You can stop the flow at that point. Then include the tool response when you are adding the next user input, just to let the AI know it was successful.

You can’t return the generated image’s url in a function tool? It seems like one of the purposes of these function tools is to interact with external APIs. In the Agents SDK Tools documentation, one of the examples is fetching weather data from an external weather API and returning the result.

Am I missing something here? Is what I proposed in my original post not feasible, not the intention of function tools or just bad practice?

I would expect the agent’s output result would be {“url”: “https://theurlofthegenerateddalle3image.png”}

Sure, you can have the AI write “I just made you an image. Now you have to click the link. That is, if you are even using a web browser and not a sandboxed app. Hope that I repeated it back correctly for you.”

It simply isn’t a good user interface.

Here’s the playground doing that.