GPT using vision capabilities for images returned from actions?

hpstoerr · January 5, 2024, 10:55am

Hi!

If you are implementing actions for a GPT - is there a way to have the actions return image data or an image so that the GPT can display and / or process it with it’s vision capabilities?

Background: I’m developing a GPT that can inspect details in the Java Content Repository in Apache Sling or Adobe AEM. Inspecting the nodes returns JSON data for the nodes, which works just fine. Reading JSPs, HTML, CSS and other text files works just fine. But it’d be nice if the GPT could at least display images, and much better if it was able to apply it’s vision capabilities on the images. But just returning the image data with the appropriate mime-type leads to a “ClientError” without any error message. Does somebody have any idea / trick I could apply?

(If somebody from OpenAI reads this: please please PLEASE take care to apply the same care to error messages given to ChatGPT. I had several cases of errors - such as the error above - where ChatGPT gets a plain “ClientError” without any
text describing the actual problem, so that it just cannot tell me what’s the problem and I had to keep banging my head against the wall and guess until I guessed right.)

Thanks to OpenAI for all the marvellous stuff it provides, and thank you all for the interesting discussions here!
I wish you all a nice new year!

Hans-Peter

SomeUser2022 · March 14, 2024, 3:06pm

bumping this, I’ve been trying to return png data with content-type image/png, which works in my browser but the Action acts like an error. I saw some posts about returning a URL in the response and trying to get the GPT to display it to the user, but I’m more interested in Vision analyzing changes as a result of its Action.

wfhbrian · March 14, 2024, 5:42pm

GPT using vision capabilities for images returned from actions would indeed be very useful.

SomeUser2022 · March 14, 2024, 7:39pm

So we’re saying its not currently implemented? Its not bug on my part?

wfhbrian · March 18, 2024, 10:22pm

Based on what I’ve seen on this forum, it seems that it is un-implemented.

I also have not seen any GPTs that appear to offer this functionality. If someone knows of one, then I would dive in to figure out how it’s done.

Topic		Replies	Views
GPT building: Display images delivered via actions Plugins / Actions builders image-reading , gpt	7	3704	January 30, 2024
Problems with a response image from a GPT Action Plugins / Actions builders gpt-4 , chatgpt , api , chatgpt-plugin	1	1104	January 23, 2024
Possible to send screenshot for analysis using action? Plugins / Actions builders gpt-4 , chatgpt , ai-interaction	1	938	December 3, 2023
How to use ChatGPT-4 to analyze images as Openai said Plugins / Actions builders	3	17212	December 13, 2023
How to modify schema of custom GPT action to send an image file with post request? Plugins / Actions builders plugin-development , openapi , chatgpt-plugin , actions	17	6879	February 2, 2024

GPT using vision capabilities for images returned from actions?

Related topics