Can GPT -vision models be accessed using API?

I need to upload images for GPT analysis. Need to call GPT-vision model. But it prompts from GPT -4o that this model is not open? Is that so?
Question: How do I upload images directly to GPT to read the information and analyze it? For example, is the URL method okay? For cost reasons, base64 encoding is not need for the project.

2 Likes

Welcome to the Developer Forum!

Could you specify the error you are experiencing?

You can either use gpt-4o or gpt-4-turbo for vision. The python code for a basic API call for either model is as follows:

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY","REPLACE WITH YOUR API KEY"))

response = client.chat.completions.create(
  model="gpt-4o", // alternatively use gpt-4-turbo
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

Source: https://platform.openai.com/docs/guides/vision

2 Likes

thanks for reply. I’m a beginner in programming. thanks for your help. I will try the solution you provided.
thanks again

2 Likes

Although you mentioned that base64 encoding is not necessary for cost reasons, there is no difference in the cost of API calls whether you use base64 encoding or pass a URL.

You can pass an image to the model via API using a URL, but in that case, you will need to host the image as a publicly accessible URL.

The costs associated with using the vision feature include:

  • Whether the image is high-resolution,
  • If high-resolution, the image resolution,
  • The total tokens for the system message, user message, and the model’s response (assistant’s output) describing the image.
    https://openai.com/api/pricing/

If there is no issue with hosting the image on a server just for the model to reference, and making it publicly accessible on the internet as a URL in terms of effort or security risks, that would be fine. However, if that poses a problem, perhaps consider using base64 encoding?

import base64
import requests

# OpenAI API Key
api_key = "YOUR_OPENAI_API_KEY"

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "path_to_your_image.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)

headers = {
  "Content-Type": "application/json",
  "Authorization": f"Bearer {api_key}"
}

payload = {
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What’s in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
          }
        }
      ]
    }
  ],
  "max_tokens": 300
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

print(response.json())

This is how you can pass an image to the model using base64 encoding. It is the same method used when attaching an image in the Playground.

Thank you very much for your help for a eginner. I use GPT to mark students’ test papers. Help them improve their learning efficiency.
For example, I need to correct their wrong knowledge points. Count how often they appear. This allows students to devote more time to learning wrong knowledge points.
Therefore, confidentiality requirements are relatively low.

1 Like

Anyone who knows the URL will be able to access the image.
Please consider this and determine if it truly poses no problem.

thanks agagin.
there are no name on the paper .it’s just Grades 1-9 students. I think it ‘s ok .
My students’ financial conditions are average. Unable to bear excessive costs.

if you have any suggestions PLS tell me .

Using a base64 encoded image object for API calls incurs no additional cost.

I am wondering about the benefits of hosting an image on a server as a publicly accessible URL.

Advantages of sending a base64 encoded image object:

  • There is no need to make the image public.

Potential disadvantages of sending a base64 encoded image object:

  • If you are using a metered connection, such as packet communication, the payload increases, which might lead to higher communication charges. This is not a concern if you are using a fixed-line connection.

Advantages of hosting an image and making it publicly accessible via URL:

  • It can be convenient when presenting something already publicly available as a URL, like the Wikipedia example mentioned above.

Disadvantages of hosting an image and making it publicly accessible via URL:

  • The additional effort and cost of hosting the image.
  • Anyone who knows the URL will be able to access the image.

Since there is no difference in the API usage fee, I think it is better not to make the image public unless there is a reason to do so.

For a picture, I calculate the pixels to be 1900*1900.

  1. The cost of using URL to transfer is US$0.003825, by gpt price. But I have not successfully transferred to GPT. So there’s no way to know the actual cost at this time.
  2. I have used base64 encoding for transmission and consumed more than 100,000 tokens.

So what you are saying is that if I use URL transmission, 100,000 tokens will actually be consumed?

Here are the results of an API call with a base64 encoded image object of the same resolution, 1900×1900.

I can’t find the figure of 100,000 tokens anywhere.

The actual cost is the sum of these total tokens plus the cost for the vision feature.

I couldn’t find a publicly available image URL with the same resolution, so I’ve also included a test with a differently sized image URL.

I understand the concerns, but there should be no cost difference between sending a URL in the API payload and the model processing it, as the model reads and processes the image on its side.