GPT using vision capabilities for images returned from actions?


If you are implementing actions for a GPT - is there a way to have the actions return image data or an image so that the GPT can display and / or process it with it’s vision capabilities?

Background: I’m developing a GPT that can inspect details in the Java Content Repository in Apache Sling or Adobe AEM. Inspecting the nodes returns JSON data for the nodes, which works just fine. Reading JSPs, HTML, CSS and other text files works just fine. But it’d be nice if the GPT could at least display images, and much better if it was able to apply it’s vision capabilities on the images. But just returning the image data with the appropriate mime-type leads to a “ClientError” without any error message. Does somebody have any idea / trick I could apply?

(If somebody from OpenAI reads this: please please PLEASE take care to apply the same care to error messages given to ChatGPT. I had several cases of errors - such as the error above - where ChatGPT gets a plain “ClientError” without any
text describing the actual problem, so that it just cannot tell me what’s the problem and I had to keep banging my head against the wall and guess until I guessed right.)

Thanks to OpenAI for all the marvellous stuff it provides, and thank you all for the interesting discussions here!
I wish you all a nice new year!



bumping this, I’ve been trying to return png data with content-type image/png, which works in my browser but the Action acts like an error. I saw some posts about returning a URL in the response and trying to get the GPT to display it to the user, but I’m more interested in Vision analyzing changes as a result of its Action.

1 Like

GPT using vision capabilities for images returned from actions would indeed be very useful.

So we’re saying its not currently implemented? Its not bug on my part?

Based on what I’ve seen on this forum, it seems that it is un-implemented.

I also have not seen any GPTs that appear to offer this functionality. If someone knows of one, then I would dive in to figure out how it’s done.

1 Like