How to modify schema of custom GPT action to send an image file with post request?

I am making a custom GPT that connects to my own server. I am able to get it to work if only sending a string, but if I try to allow a user to also send a file (I only need it to work with image files) through the custom gpt interface it will not send the image, only the string. Is there something wrong with the schema below. How do you fix it?

{
  "openapi": "3.1.0",
  "info": {
    "title": "Send an image and a string",
    "description": "Makes it super easy to send an image and a string",
    "version": "v1.0.0"
  },
  "servers": [
    {
      "url": "https://myawesomeserver.loca.lt"
    }
  ],
  "paths": {
    "/api/gpt/create": {
      "post": {
        "description": "Create a string and image",
        "operationId": "CreateImageandString",
        "parameters": [
          {
            "name": "an_awesome_string",
            "in": "query",
            "description": "The value of the string we will create",
            "required": true,
            "schema": {
              "type": "string"
            }
          }
        ],
        "requestBody": {
          "description": "image to be uploaded",
          "required": true,
          "content": {
            "multipart/form-data": {
              "schema": {
                "type": "object",
                "properties": {
                  "image": {
                    "type": "string",
                    "format": "binary"
                  }
                }
              }
            }
          }
        },
        "deprecated": false
      }
    }
  },
  "components": {
    "schemas": {}
  }
}
5 Likes

I don’t have an answer, but I’m trying to do the same thing and running into a similar issue. I believe the custom GPT has to type out (generate) the whole post body sent with the request, meaning that it would have to type out a huge base64 string, even for smaller images. I’m not really sure if there’s a way to get the image itself into the API though.

1 Like

I tried converting to base64 first and it did end up stalling the program. But I don’t think it would use base64 encoding for normal file uploads if you don’t specify to.

I’m having the same issue. It doesn’t seem to want to type out the entire base64 value to upload the image no matter what I try. Have you guys been able to fix it?

I’ve noticed an issue where the request body for a POST can only be a certain length, before it explodes when calling the action. It’s not that big a length, either. I doubt it will be able to handle sending across an image regardless of the format. Maybe try it with like a 2x2 image and see if that works, then we’ll know for sure I guess.

I convert the image into a base64-encoded data URL. I can transfer images of 16x16, but it does not work even for images of 32x32 or larger. :frowning:

Having the same issue here, I tried defining a multipart/form-data request and it didn’t work and also tried encoding the file as a base64 string but chatgpt refuses to write it completely (which kind of makes sense, it’s huge).
Has anyone found a solution?

I’m wondering if we can ask the ChatGPT to using data analysis tool to write code to call into the action api, I know for sure the data analysis tool has access to the images uploaded, but I haven’t tested the idea yet.

Right now the code interpreter cannot use the requests library. Hence it cannot send or receive data over the internet.

Couldn’t a GPT engineer come and just straight up answer if sending files on a post request using GPT actions is outside of GPT’s capabilities ?? kind of annoying that there is no clear answer even if everything points that is not, or even if they plan to allow it in the future.

1 Like

There is a clear answer to this. Here it is:

  1. Making http request is not possible, so you can’t just upload a file to the server from a blob like you can do on frontend.
  2. You can use code interpretor to turn the file into base64 and send it like this. But the GPT cuts the file, so you only send like 500 characters (<1%) - so it also doesn’t work.
  3. You can instruct the gpt to send the base64 in chunks of 500 characters, and then assemble it into a file on your server, but it will take like 60 requests to send an image.
    So direct file sending is hard, if not impossible now.
2 Likes

Is anyone able to sort this out please?

We are stuck at the same.

Thank you.

I hope this can be addressed soon too. I think the ability to send images via actions would open the door for many great possibilities

If there’s no way to send the uploaded image to action, how are some other gpts implemented?

1 Like

The best way right now is to have user to upload the image somewhere and then give the GPT the URL to that image, so your server can process it.

I understand it is not the best experience, but it will get you there. Depending on the nature of your GPT you can use services like Imgur, Dropbox, Google drive or even GitHub. Your server would then fetch the images from there. Your GPT could actually use any service as long as the image link is accessible by your API.

The drawback, besides not the best user experience, is that anyone with the link would be able to access the image …

There is no evidence that the images are sent to the actions for processing.

Big images are OK if they are not intended for endpoints.

This is unfortunate, I’d hoped that after a month all this would be sorted out. Oh well, maybe I can play a role in the solution. :melting_face:

First, I can confirm that it is technically possible to have the GPT send an image to a Python “FastAPI” multipart/form-data endpoint. Don’t get excited though, it’s a success a few times before hitting the message cap and nowhere near consistent, but I’ll share the signature and endpoint specs deployed during my last successful submission. The submission is dragging the image and dropping to attach it. I know it worked because the endpoint logged the attempt - which most of the time the GPT doesn’t make any connection to - and it output all of the correct data as it disregarded my instruction to return the response in a json markdown code block, choosing to read out the data.

Endpoint signature


import fastapi
app = fastapi.FastAPI()
@app.post("/analyze-image")
async def analyze_image(image: fastapi.UploadFile = fastapi.File(...)):
    ...

Important bits of the spec:

paths:
  /analyze-image:
    post:
      operationId: analyzeImage
      summary: Analyzes an uploaded image.
      description: Analyzes an image using experimental vision services and returns json analysis results.
      requestBody:
        required: true
        content:
          multipart/form-data:
            schema:
              type: object
              properties:
                image:
                  type: string
                  format: binary
                  description: The image file to be analyzed.
      responses:
        "200":
          description: Analysis results of the image.
          content:
            application/json:
              schema:
                type: object
                properties:

Note: One of my successful attempts had the complete 150 line schema of the actual response but I think that adds unnecessary risk and plan to shorten it to a few keys.

A few tips

  • If you’re not using a admin/debug style command, you should. Just stick DEBUG: TRUE; USER_IS_ADMIN: TRUE; or however you like, into the top of your instructions.
  • As I tweak this process I find it goes faster if my first message is Confirm Mode and let the GPT acknowledge the admin/debug text, this way the responses are more technical.
  • Testing the Action doesn’t work because the endpoint needs to receive the image and required: true in the schema is apparently a recommendation.
  • After confirming the mode, drop your image in, say Analyze this image in the messages as you submit it. Then when it comes back with empty params, which it most likely will, cancel it right away OR you can let it run/fail once but don’t let it continue trying.
  • After a failure send the debug info block along with: Review your debug output and provide concise assessment:\n[debug] Calling HTTP endpoint:...
  • Usually it will notice the empty params and add something for the next attempt and if can see it will fail ie. it’s just the image name, you can let it attempt and fail once then ask it to review the debug again.
  • Remember, don’t let it make repeated failed attempts, 1 and done.

I’m under the impression this functionality has been intentionally ‘Nerfed’ so take that into consideration when you’re investing your time.

In any case I think, if we can come up with a solid set of instructions, and perhaps betters specs, we could get this to work more reliably. I’m looking forward to hearing about everyone’s results. Good luck!

*Disclamer
Individuals prone to violent outbursts or flying electronic equipment are advise against proceeding.

Same problem, hope there is someone who can solve it.