I need to upload images for GPT analysis. Need to call GPT-vision model. But it prompts from GPT -4o that this model is not open? Is that so?
Question: How do I upload images directly to GPT to read the information and analyze it? For example, is the URL method okay? For cost reasons, base64 encoding is not need for the project.
Welcome to the Developer Forum!
Could you specify the error you are experiencing?
You can either use gpt-4o or gpt-4-turbo for vision. The python code for a basic API call for either model is as follows:
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY","REPLACE WITH YOUR API KEY"))
response = client.chat.completions.create(
model="gpt-4o", // alternatively use gpt-4-turbo
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Whatās in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
},
],
}
],
max_tokens=300,
)
print(response.choices[0])
thanks for reply. Iām a beginner in programming. thanks for your help. I will try the solution you provided.
thanks again
Although you mentioned that base64 encoding is not necessary for cost reasons, there is no difference in the cost of API calls whether you use base64 encoding or pass a URL.
You can pass an image to the model via API using a URL, but in that case, you will need to host the image as a publicly accessible URL.
The costs associated with using the vision feature include:
- Whether the image is high-resolution,
- If high-resolution, the image resolution,
- The total tokens for the system message, user message, and the modelās response (assistantās output) describing the image.
https://openai.com/api/pricing/
If there is no issue with hosting the image on a server just for the model to reference, and making it publicly accessible on the internet as a URL in terms of effort or security risks, that would be fine. However, if that poses a problem, perhaps consider using base64 encoding?
import base64
import requests
# OpenAI API Key
api_key = "YOUR_OPENAI_API_KEY"
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Path to your image
image_path = "path_to_your_image.jpg"
# Getting the base64 string
base64_image = encode_image(image_path)
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
payload = {
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Whatās in this image?"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
"max_tokens": 300
}
response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
print(response.json())
This is how you can pass an image to the model using base64 encoding. It is the same method used when attaching an image in the Playground.
Thank you very much for your help for a eginner. I use GPT to mark studentsā test papers. Help them improve their learning efficiency.
For example, I need to correct their wrong knowledge points. Count how often they appear. This allows students to devote more time to learning wrong knowledge points.
Therefore, confidentiality requirements are relatively low.
Anyone who knows the URL will be able to access the image.
Please consider this and determine if it truly poses no problem.
thanks agagin.
there are no name on the paper .itās just Grades 1-9 students. I think it ās ok .
My studentsā financial conditions are average. Unable to bear excessive costs.
if you have any suggestions PLS tell me .
Using a base64 encoded image object for API calls incurs no additional cost.
I am wondering about the benefits of hosting an image on a server as a publicly accessible URL.
Advantages of sending a base64 encoded image object:
- There is no need to make the image public.
Potential disadvantages of sending a base64 encoded image object:
- If you are using a metered connection, such as packet communication, the payload increases, which might lead to higher communication charges. This is not a concern if you are using a fixed-line connection.
Advantages of hosting an image and making it publicly accessible via URL:
- It can be convenient when presenting something already publicly available as a URL, like the Wikipedia example mentioned above.
Disadvantages of hosting an image and making it publicly accessible via URL:
- The additional effort and cost of hosting the image.
- Anyone who knows the URL will be able to access the image.
Since there is no difference in the API usage fee, I think it is better not to make the image public unless there is a reason to do so.
For a picture, I calculate the pixels to be 1900*1900.
- The cost of using URL to transfer is US$0.003825, by gpt price. But I have not successfully transferred to GPT. So thereās no way to know the actual cost at this time.
- I have used base64 encoding for transmission and consumed more than 100,000 tokens.
So what you are saying is that if I use URL transmission, 100,000 tokens will actually be consumed?
Here are the results of an API call with a base64 encoded image object of the same resolution, 1900Ć1900.
I canāt find the figure of 100,000 tokens anywhere.
The actual cost is the sum of these total tokens plus the cost for the vision feature.
I couldnāt find a publicly available image URL with the same resolution, so Iāve also included a test with a differently sized image URL.
I understand the concerns, but there should be no cost difference between sending a URL in the API payload and the model processing it, as the model reads and processes the image on its side.
Hi dignity_for_all,
I tried the code you shared and encountered an issue. When I send an image and a prompt through the API, I get incorrect responses. However, when I test the same prompt and image manually on the website, I consistently receive the correct response.
Iāve tried multiple times using the API, and the response changes each time, but manually on the website, the response remains consistent and correct. So far, I havenāt been able to get the correct response via the API.
Do you have any idea how to address this issue? What could be the differences between the API and the manual usage on the website? Arenāt they supposed to provide the same service?
Thanks!
No, they are not supposed to be the same. The API has its own models.
Variety in outputs generated is the default, if you want higher reliability with less creative use between runs, youād lower top_p
or temperature
API parameters from 1.0 to much closer to 0.0.
The main thing missing in the example is a system message. Provide a system message first, along the lines of āYou are an image inspector, providing visual analysis of user pictures with your own computer vision skillā or similar to suit the application.
Hereās the complete message in English:
Hi, thank you for the explanation.
Iām familiar with the parameters you mentioned, and Iāve already tried using them. Unfortunately, Iām still experiencing low success rates.
Hereās the request Iām sending to the API:
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}"
}
payload = {
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": prompt
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
"max_tokens": 300,
"temperature": 0,
"top_p": 0.01
}
Do you have any suggestions to improve the reliability?
Hereās the updated payload from your code with the high-quality system
role message included that I previously described as necessary:
payload = {
"model": "gpt-4o-2024-11-20",
"messages": [
{
"role": "system",
"content": (
"You are an image AI assistant. Your task is to provide visual "
"analysis of user-submitted pictures, leveraging your computer "
"vision skills. The user may include multiple messages or "
"attachments, such as text, images, or combined inputs. You are "
"capable of handling a wide variety of tasks and must generate "
"responses tailored to the content and instructions provided. "
"Ensure your answers are precise, clear, and adapted to the "
"user's request, whether it involves analysis, description, or "
"other image understanding tasks."
)
},
{
"role": "user",
"content": [
{
"type": "text",
"text": prompt
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
"max_completion_tokens": 1500,
"temperature": 0.1,
"top_p": 0.1
}
Explanation of Changes
-
System Role Message:
- Clear and robust description of the AIās purpose and capabilities.
- Emphasizes precision, adaptability, and versatility in fulfilling tasks.
-
Line Splitting:
- Used a string concatenation method with parentheses for long lines.
- Each line is kept under 70 characters to improve forum code readability.
-
Parameters updated:
- highly-deterministic output is usually undesired;
- no need to have the output cut off by under-specification; newest parameter name max_completion_tokens used;
- There are three different versioned gpt-4o models, as well as gpt-4-turbo, each with different qualities (and costs). I changed to the newest gpt-4o.
This ensures high-quality prompt-following and clear application specialization.
Your user message āpromptā also should be tailored well. You then can look at the image itself - ensuring that when it has the APIās internal mandatory downsize so the shortest side is maximum 768 pixels, it is still clear - and is being sent as a base64 file, not as image data.
(bonus, always do the downsize yourself using a high-quality lanczos resampling)
Thanks,
Iāve tried, but unfortunately, I havenāt achieved a high success rate yet.
Should I consider using a gpt-4-vision model for this? Is such a model available? I read that itās well-suited for image analysis, but the information I found might be outdated.
Do you have any updated insights or recommendations on this?
the answer is for you to discover:
Previous model name gpt-4-vision-preview
and its aliases has been shut off (for most everytbody). The latest gpt-4-turbo
points to an April 2024 model that supports vision without needing it in the name.