Returning image as tool output in Assistants API?

turbolucius · May 18, 2024, 6:51pm

Hello!

I’m trying to figure out if there’s a way to return an image as a tool output for Assistants. I can think of several use cases where that would be useful (I’m making a small app that can access a “take screenshot” function to see what’s happening on my computer automatically), but the documentation doesn’t seem to cover that possibility.

Is it possible in one way or another, or planned? A workaround I can think of is to simply send the image in a separate message after the assistant called the function, but I’m guessing that would probably waste tokens? From my testing, the assistant can’t help but post a message after calling the function instead of waiting for the image to get sent in the next one. Most of the time just hallucinating a link to a screenshot that doesn’t exist.

EDIT (accidentally sent the message before I finished typing it, oops)

MrFriday · May 19, 2024, 3:51am

The only way I see would be Prompt the Assistant in such a way that it outputs a response that you can directly input in Image Generation Model(Dall.E) to generate an image.

_j · May 19, 2024, 9:59am

Yes, you are correct that the Assistants, once invoking a tool_call, can only sit there and wait for your return value (“tool output”), and then will produce a reply based on the tool return value (which can’t be a binary file, or an image for further vision understanding by the AI).

That means that you don’t have versatility like chat completions, where you can be in complete control of messages and functions every call you make (and not so much tools, where you are forced to return the same ID as was called, or ignore the tool call), and could place another user message in the context before or after the function for vision to have a look at.

About the only way in Assistants is to have a function return that has already performed image-to-text, with all the information the AI would answer about anyway, written in natural language.

Your function could have parameters get_user_screenshot(“query”: “(what the AI would like to know about the screenshot)”) and then employ a separate chat completion vision API call with that user message and image.

Fxynos · June 4, 2025, 10:09am

Hi! I’ll be short.

Upload file using File Upload API (it’s used for Assistants, Batch and others OpenAI API’s to reference files).
Use id from POST /files response. Submit tool output with the file id. For example:

{"screenshot_file_id": "file-abc123"}

_j · June 4, 2025, 4:45pm

Perhaps too short - what use can a language AI in Assistants make of an internal file ID from the storage endpoint?

can’t see the image (the desire of this topic)
can’t provide the ID for an end-user (they’d need a hosted URL)

Only “user” role messages can include images. You cannot push a user message onto a thread stack while a tool call is still open, and returning the tool call makes the AI write language about the results.

Topic		Replies	Views
Image vision from submit_tool_outputs? API assistants-api	0	98	January 31, 2025
Is it possible to get the Assistants API to generate images API	4	4566	June 22, 2024
Returning image as result of function call to gpt-4-turbo Bugs	11	5928	November 4, 2024
Gpt4-o Support for Image URLS as tool responses API gpt-4 , image-reading , tools , gpt4o	16	1634	July 19, 2025
Assistants API Feature Request: support for files as tool outputs Feedback api , tool , assistants , assistants-api	10	2569	November 29, 2023

Returning image as tool output in Assistants API?

Related topics