Can GPT -vision models be accessed using API?

hongyhbs · June 25, 2024, 1:04pm

I need to upload images for GPT analysis. Need to call GPT-vision model. But it prompts from GPT -4o that this model is not open? Is that so?
Question: How do I upload images directly to GPT to read the information and analyze it? For example, is the URL method okay? For cost reasons, base64 encoding is not need for the project.

jr.2509 · June 25, 2024, 1:25pm

Welcome to the Developer Forum!

Could you specify the error you are experiencing?

You can either use gpt-4o or gpt-4-turbo for vision. The python code for a basic API call for either model is as follows:

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY","REPLACE WITH YOUR API KEY"))

response = client.chat.completions.create(
  model="gpt-4o", // alternatively use gpt-4-turbo
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

Source: https://platform.openai.com/docs/guides/vision

hongyhbs · June 26, 2024, 1:21am

thanks for reply. I’m a beginner in programming. thanks for your help. I will try the solution you provided.
thanks again

dignity_for_all · June 26, 2024, 3:06am

Although you mentioned that base64 encoding is not necessary for cost reasons, there is no difference in the cost of API calls whether you use base64 encoding or pass a URL.

You can pass an image to the model via API using a URL, but in that case, you will need to host the image as a publicly accessible URL.

The costs associated with using the vision feature include:

Whether the image is high-resolution,
If high-resolution, the image resolution,
The total tokens for the system message, user message, and the model’s response (assistant’s output) describing the image.
https://openai.com/api/pricing/

If there is no issue with hosting the image on a server just for the model to reference, and making it publicly accessible on the internet as a URL in terms of effort or security risks, that would be fine. However, if that poses a problem, perhaps consider using base64 encoding?

import base64
import requests

# OpenAI API Key
api_key = "YOUR_OPENAI_API_KEY"

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "path_to_your_image.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)

headers = {
  "Content-Type": "application/json",
  "Authorization": f"Bearer {api_key}"
}

payload = {
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What’s in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
          }
        }
      ]
    }
  ],
  "max_tokens": 300
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

print(response.json())

This is how you can pass an image to the model using base64 encoding. It is the same method used when attaching an image in the Playground.

hongyhbs · June 26, 2024, 6:13am

Thank you very much for your help for a eginner. I use GPT to mark students’ test papers. Help them improve their learning efficiency.
For example, I need to correct their wrong knowledge points. Count how often they appear. This allows students to devote more time to learning wrong knowledge points.
Therefore, confidentiality requirements are relatively low.

dignity_for_all · June 26, 2024, 6:20am

Anyone who knows the URL will be able to access the image.
Please consider this and determine if it truly poses no problem.

hongyhbs · June 26, 2024, 6:33am

thanks agagin.
there are no name on the paper .it’s just Grades 1-9 students. I think it ‘s ok .
My students’ financial conditions are average. Unable to bear excessive costs.

if you have any suggestions PLS tell me .

dignity_for_all · June 26, 2024, 6:51am

Using a base64 encoded image object for API calls incurs no additional cost.

I am wondering about the benefits of hosting an image on a server as a publicly accessible URL.

Advantages of sending a base64 encoded image object:

There is no need to make the image public.

Potential disadvantages of sending a base64 encoded image object:

If you are using a metered connection, such as packet communication, the payload increases, which might lead to higher communication charges. This is not a concern if you are using a fixed-line connection.

Advantages of hosting an image and making it publicly accessible via URL:

It can be convenient when presenting something already publicly available as a URL, like the Wikipedia example mentioned above.

Disadvantages of hosting an image and making it publicly accessible via URL:

The additional effort and cost of hosting the image.
Anyone who knows the URL will be able to access the image.

Since there is no difference in the API usage fee, I think it is better not to make the image public unless there is a reason to do so.

hongyhbs · June 26, 2024, 7:31am

For a picture, I calculate the pixels to be 1900*1900.

The cost of using URL to transfer is US$0.003825, by gpt price. But I have not successfully transferred to GPT. So there’s no way to know the actual cost at this time.
I have used base64 encoding for transmission and consumed more than 100,000 tokens.

So what you are saying is that if I use URL transmission, 100,000 tokens will actually be consumed?

dignity_for_all · June 26, 2024, 8:31am

Here are the results of an API call with a base64 encoded image object of the same resolution, 1900×1900.

I can’t find the figure of 100,000 tokens anywhere.

The actual cost is the sum of these total tokens plus the cost for the vision feature.

I couldn’t find a publicly available image URL with the same resolution, so I’ve also included a test with a differently sized image URL.

I understand the concerns, but there should be no cost difference between sending a URL in the API payload and the model processing it, as the model reads and processes the image on its side.

morann · January 6, 2025, 8:02am

Hi dignity_for_all,

I tried the code you shared and encountered an issue. When I send an image and a prompt through the API, I get incorrect responses. However, when I test the same prompt and image manually on the website, I consistently receive the correct response.

I’ve tried multiple times using the API, and the response changes each time, but manually on the website, the response remains consistent and correct. So far, I haven’t been able to get the correct response via the API.

Do you have any idea how to address this issue? What could be the differences between the API and the manual usage on the website? Aren’t they supposed to provide the same service?

Thanks!

arata · January 6, 2025, 8:11am

No, they are not supposed to be the same. The API has its own models.

Variety in outputs generated is the default, if you want higher reliability with less creative use between runs, you’d lower top_p or temperature API parameters from 1.0 to much closer to 0.0.

The main thing missing in the example is a system message. Provide a system message first, along the lines of “You are an image inspector, providing visual analysis of user pictures with your own computer vision skill” or similar to suit the application.

morann · January 6, 2025, 12:13pm

Here’s the complete message in English:

Hi, thank you for the explanation.
I’m familiar with the parameters you mentioned, and I’ve already tried using them. Unfortunately, I’m still experiencing low success rates.
Here’s the request I’m sending to the API:

headers = {
  "Content-Type": "application/json",
  "Authorization": f"Bearer {API_KEY}"
}

payload = {
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": prompt
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
          }
        }
      ]
    }
  ],
  "max_tokens": 300,
  "temperature": 0,
  "top_p": 0.01
}

Do you have any suggestions to improve the reliability?

arata · January 6, 2025, 12:37pm

Here’s the updated payload from your code with the high-quality system role message included that I previously described as necessary:

payload = {
    "model": "gpt-4o-2024-11-20",
    "messages": [
        {
            "role": "system",
            "content": (
                "You are an image AI assistant. Your task is to provide visual "
                "analysis of user-submitted pictures, leveraging your computer "
                "vision skills. The user may include multiple messages or "
                "attachments, such as text, images, or combined inputs. You are "
                "capable of handling a wide variety of tasks and must generate "
                "responses tailored to the content and instructions provided. "
                "Ensure your answers are precise, clear, and adapted to the "
                "user's request, whether it involves analysis, description, or "
                "other image understanding tasks."
            )
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        }
    ],
    "max_completion_tokens": 1500,
    "temperature": 0.1,
    "top_p": 0.1
}

Explanation of Changes

System Role Message:
- Clear and robust description of the AI’s purpose and capabilities.
- Emphasizes precision, adaptability, and versatility in fulfilling tasks.
Line Splitting:
- Used a string concatenation method with parentheses for long lines.
- Each line is kept under 70 characters to improve forum code readability.
Parameters updated:
- highly-deterministic output is usually undesired;
- no need to have the output cut off by under-specification; newest parameter name max_completion_tokens used;
- There are three different versioned gpt-4o models, as well as gpt-4-turbo, each with different qualities (and costs). I changed to the newest gpt-4o.

This ensures high-quality prompt-following and clear application specialization.

Your user message “prompt” also should be tailored well. You then can look at the image itself - ensuring that when it has the API’s internal mandatory downsize so the shortest side is maximum 768 pixels, it is still clear - and is being sent as a base64 file, not as image data.

(bonus, always do the downsize yourself using a high-quality lanczos resampling)

morann · January 7, 2025, 7:46pm

Thanks,
I’ve tried, but unfortunately, I haven’t achieved a high success rate yet.

Should I consider using a gpt-4-vision model for this? Is such a model available? I read that it’s well-suited for image analysis, but the information I found might be outdated.

Do you have any updated insights or recommendations on this?

arata · January 7, 2025, 7:55pm

the answer is for you to discover:

Previous model name gpt-4-vision-preview and its aliases has been shut off (for most everytbody). The latest gpt-4-turbo points to an April 2024 model that supports vision without needing it in the name.

Topic		Replies	Views
Moving from gpt-4-vision-preview to gpt-4o Image URL Base64 API gpt-4 , api , gpt-4-vision	2	583	September 11, 2024
Question about GPT-4 Vision API and Limits in Image Analysis API api	4	123	January 27, 2025
The performance difference between ChatGPT4o and gpt4o api using the same prompt for image analysis API gpt-4 , chatgpt , gpt-4-vision , gpt4-vision , api-vision	5	888	July 27, 2024
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3588	December 6, 2023
How to Use Vision Capabilities with GPT-4 via API? API	1	94	January 17, 2025

Can GPT -vision models be accessed using API?

Explanation of Changes

Related topics