Confusion reading docs as a new developer and gpt4 vision api help

Hi everyone, I am currently toying with the gpt4 vision api and wanted to send a request where the image being sent is the result of a file upload input on my front end or an image taken in real time with their webcam/phone.

here is an excerpt of what im sending -

  {
    role: "user",
    content: [
      {
        type: "text",
        text: "Please do XYZ and generate a detailed response of at least 500 words.",
      },
      {
        type: "image_url",
        image_url: {
          url: "https://www.url-goes-here.com/image",
        },
      },
    ],
  },

My first questions is, where in the docs/reference do I find what other types are allowed to be used (in reference to the image_url type). My second question is, what type would I use if I want to send the image file they select or take with their webcam/phone? There is no “url”. Would I have to save the image to a database and pull from there?

Welcome back.

From the docs…

Images can are made available to the model in two main ways: by passing a link to the image or by passing the base64 encoded image directly in the request.

Also from the docs FAQ section…

What type of files can I upload?

We currently support PNG (.png), JPEG (.jpeg and .jpg), WEBP (.webp), and non-animated GIF (.gif).

Link to GPT-4 vision quickstart guide…

Hope this helps!

1 Like