I am developing a custom got which will perform actions on a users computer. This could be for data entry, tech support, lots of possible use cases.
One of the actions I would like to be able to do is take a screen shot and have gpt4 vision analyse the screen shot. I have an API end point which will do this, but when my gpt calls it I get an error. Is this even possible? Here is my action json
{
"openapi": "3.1.0",
"info": {
"title": "Browser Automation API",
"version": "1.0.0",
"description": "API for performing various browser operations."
},
"servers": [
{
"url": "https://abc.com"
}
],
"paths": {
"/screenshot": {
"get": {
"summary": "Take Screenshot",
"description": "Takes a screenshot of the current page and returns the image.",
"operationId": "takeScreenshot",
"responses": {
"200": {
"description": "Screenshot taken successfully",
"content": {
"image/png": {}
}
},
"default": {
"description": "Error",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {
"status": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
}
}
}
}
}
}
}
Has anyone managed to do anything similar? The API end point works and will display the current screenshot so that side appears good, it just seems the gpt part is not ingesting the image
(text output and ingested fine from the same endpoint)
Thanks and happy gpt’ing!
Justin