Uploading images to the ChatGPT API?

yoshizzle45 · October 19, 2024, 11:49pm

I’ve recently started working on a project that used to upload images to the ChatGPT API by encoding it in base64. However the guy who originally developed the solution stopped working on it as he said the API stopped accepting images as an input on the API. I thought this would be a simple thing to verify using the docs but somehow haven’t found whether this is true or not. Can anyone verify?

_j · October 20, 2024, 12:44am

There is nothing like that which I could think of, where image vision ability has been taken away, except in using images as a function tool return on newer models that one might switch to.

There are two different methods that you can supply an image:

Chat Completion

send an internet URL for an image as part of a user message contents
send a base64 encoded image as part of a user message contents

Assistants

send an internet URL with user message
upload a file to storage, and send the file ID with user message

You should determine which of these API endpoints is under use, and how images were being employed.

wosaibfm · November 15, 2024, 10:50am

Hello, this code result in funny chars only, thou i think i am following the same structure to send an image to chatgpt, i also used a different model but same results. here is my R code

wosaibfm · November 15, 2024, 10:51am

sorry, here is the R code. not seen is env setup and initalization and loading packages.

text_prompt <- "Please analyze the contents of this image and describe its key features."

# Prepare the request with multipart form-data
response <- POST(
  url,
  add_headers(
    Authorization = paste("Bearer", apiKey)
  ),
  body = list(
    model = "gpt-4-turbo",
    messages = toJSON(list(
      list(role = "user", content = text_prompt)
    ), auto_unbox = TRUE),
    file = upload_file(image_path)
  ),
  encode = "multipart"
)

# Parse and display the response
result <- content(response, "parsed")
print(result)

_j · November 15, 2024, 2:50pm

Nothing previously in this topic indicates that R was being used, which is a challenging platform to use without an SDK that eases making requests to the API.

The previous code cannot function, and was likely written by an uninformed bot, as it is completely imaginary. It doesn’t even use a vision AI model.

You don’t “upload” to chat completions or use multipart/form-data; the image is base64 encoded as a content part of a user message, a body of type “application/json”.

Besides R being the right place to do math and the wrong place to develop chatbots, here’s a more plausible solution.

This R script performs the following steps:

Environment Setup: Loads necessary libraries.
Image Encoding: Reads and base64 encodes the image.
Message Construction: Constructs system and user messages, including the encoded image.
Request Parameters: Sets up the API request parameters.
HTTP Request: Sends the POST request to the OpenAI API.
Response Handling: Processes and displays the response.

Complete R Code

# Load necessary libraries
library(httr)
library(jsonlite)
library(base64enc)

# Set your API key as an environment variable or directly assign it
api_key <- Sys.getenv("OPENAI_API_KEY")
# Alternatively, you can set it directly (not recommended for security reasons)
# api_key <- "your_openai_api_key_here"

# Define the image path
image_path <- "./img1.png"

# Read and base64 encode the image
if (!file.exists(image_path)) {
  stop("Image file does not exist at the specified path.")
}

image_binary <- readBin(image_path, what = "raw", n = file.info(image_path)$size)
base64_image <- base64encode(image_binary)
data_url <- paste0("data:image/png;base64,", base64_image)

# Construct the system message
system_message <- list(
  role = "system",
  content = paste(
    "You are ChatPal, an AI assistant powered by GPT-4o, with computer vision.",
    "AI knowledge cutoff: October 2023",
    "",
    "Built-in vision capabilities:",
    "- extract text from image",
    "- describe images",
    "- analyze image contents",
    "- logical problem solving requiring reasoning and contextual consideration",
    sep = "\n"
  )
)

# Construct the user message with text and image_url
user_message <- list(
  role = "user",
  content = list(
    list(type = "text", text = "Analyze this image, using built in vision."),
    list(type = "image_url", image_url = list(url = data_url))
    # Add additional text or image_url blocks here if needed
  )
)

# Combine messages
messages <- list(system_message, user_message)

# Define the request parameters
params <- list(
  model = "gpt-4o",
  messages = messages,
  max_tokens = 1500,
  top_p = 0.5,
  temperature = 0.5
)

# Define the headers
headers <- add_headers(
  `Content-Type` = "application/json",
  Authorization = paste("Bearer", api_key)
)

# Make the POST request
response <- POST(
  url = "https://api.openai.com/v1/chat/completions",
  headers,
  body = toJSON(params, auto_unbox = TRUE, pretty = TRUE)
)

# Check for HTTP errors
if (http_error(response)) {
  status <- status_code(response)
  error_message <- content(response, "text", encoding = "UTF-8")
  stop(sprintf("HTTP error %s: %s", status, error_message))
}

# Parse and display the response
response_content <- content(response, "parsed", simplifyVector = TRUE)

# Extract and print the assistant's reply
assistant_reply <- response_content$choices[[1]]$message$content
cat("Assistant's Reply:\n", assistant_reply, "\n\n")

# Extract and print usage information
usage_info <- response_content$usage
cat("Usage Information:\n")
print(usage_info)

JSON Structure: We use the R list to create both JSON objects and arrays. The toJSON function from the jsonlite package will treat lists as JSON objects if the list is named (i.e., has keys), and as arrays if it is unnamed. By controlling the use of names in our list constructs, we ensure the correct JSON data structure. The presentation within code is non-obvious.

Explanation of Key Components

Libraries:
- httr: For handling HTTP requests.
- jsonlite: For JSON serialization and deserialization.
- base64enc: For base64 encoding of the image.
API Key Handling:
- The API key is retrieved from an environment variable for security. Ensure that the OPENAI_API_KEY environment variable is set in your system.
- Alternatively, you can assign the API key directly, but this is not recommended due to security concerns.
Image Encoding:
- The image is read in binary mode and then base64 encoded.
- A data URL is constructed by prefixing the encoded string with data:image/png;base64,.
Message Construction:
- System Message: Defines the assistant’s capabilities and context.
- User Message: Contains a list with both text and image URL objects. This mirrors the structure used in your Python example.
Request Parameters:
- Specifies the model (gpt-4o), the combined messages, and other parameters like max_tokens, top_p, and temperature.
HTTP Request:
- Uses POST to send the request to the OpenAI API endpoint.
- The body of the request is serialized to JSON using toJSON with auto_unbox = TRUE to ensure proper formatting.
Error Handling:
- Checks if the HTTP response contains an error and stops execution with an error message if so.
Response Parsing:
- Extracts the assistant’s reply and usage information from the JSON response.
- Prints the assistant’s reply and usage details to the console.

Additional Notes

Error Handling: The script includes basic error handling for missing image files and HTTP errors. You can expand this to handle more specific cases as needed.
Security: Always keep your API keys secure. Avoid hardcoding them into scripts, especially if the code is to be shared or stored in version control systems.
Dependencies: Ensure that all required packages (httr, jsonlite, base64enc) are installed. You can install any missing packages using install.packages("package_name").
Model Name: The model is specified as "gpt-4o". Ensure that this is the correct model name as per OpenAI’s API documentation. If it’s a typo and should be "gpt-4-turbo", please update accordingly.

Installing Required Packages

If you haven’t already installed the necessary packages, you can do so using the following commands:

install.packages("httr")
install.packages("jsonlite")
install.packages("base64enc")

Setting the OpenAI API Key

Before running the script, set your OpenAI API key as an environment variable. You can do this in R as follows:

Sys.setenv(OPENAI_API_KEY = "your_openai_api_key_here")

Alternatively, set it in your system’s environment variables to persist across sessions.

Running the Script

Ensure that the image file (img1.png) exists at the specified path. Run the script in your R environment. Upon successful execution, it will display the assistant’s reply and usage information.

Also AI produced by one qualified, looking OK, but not run.

This R script should replicate the functionality of this demonstrative Python example, constructing the appropriate API request, handling image encoding, and processing the response from the OpenAI API. This is a better reference of how to make vision requests, and shows the construction of a request outside of using the Python SDK but with programming beyond CURL examples.

import os        # to obtain environment variables, or file operations
import base64    # to encode image file to string
import requests  # or `httpx as requests`

image_path = "./img1.png"
with open(image_path, "rb") as image_file:
    base64_image = base64.b64encode(image_file.read()).decode('utf-8')

# A system message must indicate vision ability, or face denials
system = [{"role": "system", "content":"""
You are ChatPal, an AI assistant powered by GPT-4o, with computer vision.
AI knowledge cutoff: October 2023

Built-in vision capabilities:
- extract text from image
- describe images
- analyze image contents
- logical problem solving requiring reasoning and contextual consideration
""".strip()
}]

# A user message "content" is now an array of type objects instead of a string
user = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Analyze this image, using built in vision.",
            },
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{base64_image}"},
            },
            # additional text or image_url blocks
        ],
    }
]

# We construct the dictionary as a streamable input to the JSON parameter
params = {
    # `model` must specifically support multimodal input for images
    "model": "gpt-4o",
    # An initial session input, not having a chat history replayed
    "messages": system + user,
    # Parameters for model operation
    "max_tokens": 1500, "top_p": 0.5, "temperature": 0.5,
}

headers = {
   # Content-Type is added by `requests`, but we demonstrate it
   "Content-Type": "application/json",
   # API key is obtained from OpenAI standard environment variable
   "Authorization": f"Bearer {os.environ.get('OPENAI_API_KEY')}"
}

response = requests.post("https://api.openai.com/v1/chat/completions",
                         headers=headers, json=params)

if response.status_code != 200:
    print(f"HTTP error {response.status_code}: {response.text}")
else:
    # print(response.json())  # full response example
    print(response.json()['choices'][0]['message']['content'])
    print(response.json()['usage'])

wosaibfm · November 15, 2024, 4:57pm

Thank you so much, yes this code works.

ycat · February 27, 2025, 2:58am

As a python developer i wish to simply say, R is absolutely atrocious for bot dev

Topic		Replies	Views
Ask GPT-4o about a file - Example python function with file upload base64 and tiktoken and usage history with forced json return API gpt-4o	3	3833	June 8, 2024
Open AI vision model claiming it's text only? API	4	1466	January 18, 2024
Unable to directly analyze or view the content of files like (local) images API chat-completion , gpt-4-vision	3	561	November 7, 2024
Moving from gpt-4-vision-preview to gpt-4o Image URL Base64 API gpt-4 , api , gpt-4-vision	2	690	September 11, 2024
Which is correct model for image analysis? API	5	204	December 9, 2024