Although you mentioned that base64 encoding is not necessary for cost reasons, there is no difference in the cost of API calls whether you use base64 encoding or pass a URL.
You can pass an image to the model via API using a URL, but in that case, you will need to host the image as a publicly accessible URL.
The costs associated with using the vision feature include:
- Whether the image is high-resolution,
- If high-resolution, the image resolution,
- The total tokens for the system message, user message, and the model’s response (assistant’s output) describing the image.
https://openai.com/api/pricing/
If there is no issue with hosting the image on a server just for the model to reference, and making it publicly accessible on the internet as a URL in terms of effort or security risks, that would be fine. However, if that poses a problem, perhaps consider using base64 encoding?
import base64
import requests
# OpenAI API Key
api_key = "YOUR_OPENAI_API_KEY"
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Path to your image
image_path = "path_to_your_image.jpg"
# Getting the base64 string
base64_image = encode_image(image_path)
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
payload = {
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What’s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
"max_tokens": 300
}
response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
print(response.json())
This is how you can pass an image to the model using base64 encoding. It is the same method used when attaching an image in the Playground.