I’m building custom GPT that should be able to comment on pictures and make new pictures on demand.
I’ve tried to solve this in different ways. Currently I’m hosting a small server and sharing a static picture. Everyone can see that picture and I can share the link for everyone if anyone wants(it uses self-signed cert for HTTPS).
My local test API calls and direct link in the browser are working perfectly fine.
On the other hand, GPT refuses to see it or use API in any way.
The same is for API calls from custom GPT actions. My server doesn’t even detect requests.
I can give additional information if needed.
Can someone explain why it’s not working?
Then the tool “browsing” is going to scrape page data.
I still shouldn’t expect that ChatGPT vision can “see” the contents of an image; The only way that an image can be perceived is to have it be part of a user message to the GPT-4 model that supports computer vision. There is no other path of loading or analyzing images except on demand of the user (or an API user placing those images into a user role message).
I think you are right. I assumed it could see images, as I used GPT to browse other sites, but it was actually only text.
So maybe I can use GPT assistants for this? I doubt that I will use it, as it has additional pricing
@_j , so now I’ve added an API for text. But GPT still can’t see it:
In my browser:
On the server:
My schema is the following:
title: Screenshot API
description: API to interact with a server that provides a static image and allows taking screenshots.
- url: https://server
description: Main server hosting the pictures and chat functionality.
summary: Retrieves the latest chat messages.
description: Returns the latest chat messages at the specified URL.
description: Latest chat messages.