Hi.
I’m building custom GPT that should be able to comment on pictures and make new pictures on demand.
I’ve tried to solve this in different ways. Currently I’m hosting a small server and sharing a static picture. Everyone can see that picture and I can share the link for everyone if anyone wants(it uses self-signed cert for HTTPS).
My local test API calls and direct link in the browser are working perfectly fine.
On the other hand, GPT refuses to see it or use API in any way.
The “error browsing” may be further “untrusted” actions or untrusted http document types. You’ll need a verified builder domain, schema, authentication, privacy policy, etc. and the ability to then publish to “everyone” to then make those custom actions.
Then the tool “browsing” is going to scrape page data.
I still shouldn’t expect that ChatGPT vision can “see” the contents of an image; The only way that an image can be perceived is to have it be part of a user message to the GPT-4 model that supports computer vision. There is no other path of loading or analyzing images except on demand of the user (or an API user placing those images into a user role message).
I think you are right. I assumed it could see images, as I used GPT to browse other sites, but it was actually only text.
So maybe I can use GPT assistants for this? I doubt that I will use it, as it has additional pricing
openapi: 3.0.0
info:
title: Screenshot API
description: API to interact with a server that provides a static image and allows taking screenshots.
version: 1.0.0
servers:
- url: https://server
description: Main server hosting the pictures and chat functionality.
paths:
/chat:
get:
operationId: getChat
summary: Retrieves the latest chat messages.
description: Returns the latest chat messages at the specified URL.
responses:
"200":
description: Latest chat messages.
content:
text:
schema:
type: string