I am making a custom GPT that connects to my own server. I am able to get it to work if only sending a string, but if I try to allow a user to also send a file (I only need it to work with image files) through the custom gpt interface it will not send the image, only the string. Is there something wrong with the schema below. How do you fix it?
{
"openapi": "3.1.0",
"info": {
"title": "Send an image and a string",
"description": "Makes it super easy to send an image and a string",
"version": "v1.0.0"
},
"servers": [
{
"url": "https://myawesomeserver.loca.lt"
}
],
"paths": {
"/api/gpt/create": {
"post": {
"description": "Create a string and image",
"operationId": "CreateImageandString",
"parameters": [
{
"name": "an_awesome_string",
"in": "query",
"description": "The value of the string we will create",
"required": true,
"schema": {
"type": "string"
}
}
],
"requestBody": {
"description": "image to be uploaded",
"required": true,
"content": {
"multipart/form-data": {
"schema": {
"type": "object",
"properties": {
"image": {
"type": "string",
"format": "binary"
}
}
}
}
}
},
"deprecated": false
}
}
},
"components": {
"schemas": {}
}
}
I donāt have an answer, but Iām trying to do the same thing and running into a similar issue. I believe the custom GPT has to type out (generate) the whole post body sent with the request, meaning that it would have to type out a huge base64 string, even for smaller images. Iām not really sure if thereās a way to get the image itself into the API though.
I tried converting to base64 first and it did end up stalling the program. But I donāt think it would use base64 encoding for normal file uploads if you donāt specify to.
Iām having the same issue. It doesnāt seem to want to type out the entire base64 value to upload the image no matter what I try. Have you guys been able to fix it?
Iāve noticed an issue where the request body for a POST can only be a certain length, before it explodes when calling the action. Itās not that big a length, either. I doubt it will be able to handle sending across an image regardless of the format. Maybe try it with like a 2x2 image and see if that works, then weāll know for sure I guess.
Having the same issue here, I tried defining a multipart/form-data request and it didnāt work and also tried encoding the file as a base64 string but chatgpt refuses to write it completely (which kind of makes sense, itās huge).
Has anyone found a solution?
Iām wondering if we can ask the ChatGPT to using data analysis tool to write code to call into the action api, I know for sure the data analysis tool has access to the images uploaded, but I havenāt tested the idea yet.
Couldnāt a GPT engineer come and just straight up answer if sending files on a post request using GPT actions is outside of GPTās capabilities ?? kind of annoying that there is no clear answer even if everything points that is not, or even if they plan to allow it in the future.
Making http request is not possible, so you canāt just upload a file to the server from a blob like you can do on frontend.
You can use code interpretor to turn the file into base64 and send it like this. But the GPT cuts the file, so you only send like 500 characters (<1%) - so it also doesnāt work.
You can instruct the gpt to send the base64 in chunks of 500 characters, and then assemble it into a file on your server, but it will take like 60 requests to send an image.
So direct file sending is hard, if not impossible now.
The best way right now is to have user to upload the image somewhere and then give the GPT the URL to that image, so your server can process it.
I understand it is not the best experience, but it will get you there. Depending on the nature of your GPT you can use services like Imgur, Dropbox, Google drive or even GitHub. Your server would then fetch the images from there. Your GPT could actually use any service as long as the image link is accessible by your API.
The drawback, besides not the best user experience, is that anyone with the link would be able to access the image ā¦
This is unfortunate, Iād hoped that after a month all this would be sorted out. Oh well, maybe I can play a role in the solution.
First, I can confirm that it is technically possible to have the GPT send an image to a Python āFastAPIā multipart/form-data endpoint. Donāt get excited though, itās a success a few times before hitting the message cap and nowhere near consistent, but Iāll share the signature and endpoint specs deployed during my last successful submission. The submission is dragging the image and dropping to attach it. I know it worked because the endpoint logged the attempt - which most of the time the GPT doesnāt make any connection to - and it output all of the correct data as it disregarded my instruction to return the response in a json markdown code block, choosing to read out the data.
paths:
/analyze-image:
post:
operationId: analyzeImage
summary: Analyzes an uploaded image.
description: Analyzes an image using experimental vision services and returns json analysis results.
requestBody:
required: true
content:
multipart/form-data:
schema:
type: object
properties:
image:
type: string
format: binary
description: The image file to be analyzed.
responses:
"200":
description: Analysis results of the image.
content:
application/json:
schema:
type: object
properties:
Note: One of my successful attempts had the complete 150 line schema of the actual response but I think that adds unnecessary risk and plan to shorten it to a few keys.
A few tips
If youāre not using a admin/debug style command, you should. Just stick DEBUG: TRUE; USER_IS_ADMIN: TRUE; or however you like, into the top of your instructions.
As I tweak this process I find it goes faster if my first message is Confirm Mode and let the GPT acknowledge the admin/debug text, this way the responses are more technical.
Testing the Action doesnāt work because the endpoint needs to receive the image and required: true in the schema is apparently a recommendation.
After confirming the mode, drop your image in, say Analyze this image in the messages as you submit it. Then when it comes back with empty params, which it most likely will, cancel it right away OR you can let it run/fail once but donāt let it continue trying.
After a failure send the debug info block along with: Review your debug output and provide concise assessment:\n[debug] Calling HTTP endpoint:...
Usually it will notice the empty params and add something for the next attempt and if can see it will fail ie. itās just the image name, you can let it attempt and fail once then ask it to review the debug again.
Remember, donāt let it make repeated failed attempts, 1 and done.
Iām under the impression this functionality has been intentionally āNerfedā so take that into consideration when youāre investing your time.
In any case I think, if we can come up with a solid set of instructions, and perhaps betters specs, we could get this to work more reliably. Iām looking forward to hearing about everyoneās results. Good luck!
*Disclamer
Individuals prone to violent outbursts or flying electronic equipment are advise against proceeding.