GPT using vision capabilities for images returned from actions?


If you are implementing actions for a GPT - is there a way to have the actions return image data or an image so that the GPT can display and / or process it with it’s vision capabilities?

Background: I’m developing a GPT that can inspect details in the Java Content Repository in Apache Sling or Adobe AEM. Inspecting the nodes returns JSON data for the nodes, which works just fine. Reading JSPs, HTML, CSS and other text files works just fine. But it’d be nice if the GPT could at least display images, and much better if it was able to apply it’s vision capabilities on the images. But just returning the image data with the appropriate mime-type leads to a “ClientError” without any error message. Does somebody have any idea / trick I could apply?

(If somebody from OpenAI reads this: please please PLEASE take care to apply the same care to error messages given to ChatGPT. I had several cases of errors - such as the error above - where ChatGPT gets a plain “ClientError” without any
text describing the actual problem, so that it just cannot tell me what’s the problem and I had to keep banging my head against the wall and guess until I guessed right.)

Thanks to OpenAI for all the marvellous stuff it provides, and thank you all for the interesting discussions here!
I wish you all a nice new year!