OpenAI's Python SDK (v1.37.1) doesn’t support direct image input to GPT models

senseesha · October 30, 2024, 11:54am

OpenAI’s Python SDK (v1.37.1) doesn’t support direct image input to GPT models, including gpt-4-turbo or GPT4.0-mini. The API only accepts text inputs, so image-based tasks, like OCR or image detection is not being possible.

Are there any possibilities to address this challenge ?

sps · October 30, 2024, 12:12pm

Welcome to the community @senseesha

Here’s how you can use base64 encoded image inputs for vision.

senseesha · October 30, 2024, 12:26pm

@sps Thank you for your response!
However, I want to clarify that while Base64 encoding converts images into a text format, OpenAI’s API does not currently support processing images, including those encoded in Base64, for tasks like image/symbol identification.
The Base64 encoding itself is just a method of encoding the image for transmission, not a means of recognizing or processing the image content.

sps · October 30, 2024, 12:42pm

Yes, the image has to be sent to the model in order for it to “see” the image. Base64 is a way of encoding the image to prepare it for transmission to the model so it can be processed. The other option is to send the URL of the image where it’s hosted on the web.

The images are sent as image content blocks, which is different from text. I recommend running the previously shared boilerplate code from the docs.

You can also quickly experiment using Playground to test Vision capabilities.

arata · October 30, 2024, 1:00pm

I want to clarify that OpenAI’s API and AI models have support for processing images - viewing them with GPT-4 vision.

Here is example code with correct usage for OpenAI’s Python SDK.

import base64
import openai

# Read the image file and encode it in base64
with open("image.jpg", "rb") as image_file:
    base64_encoded_image = base64.b64encode(image_file.read()).decode("utf-8")

# Prepare the message with the prompt and image
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Analyze this image using built in vision, extracting text.",
            },
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{base64_encoded_image}"},
            },
        ],
    }
]

# Set up the parameters for the API request
parameters = {
    "model": "gpt-4o",
    "messages": messages,
}

# Initialize the OpenAI client
client = openai.Client()

# Send the request and receive the response
response = client.chat.completions.create(**parameters)
print(response.choices[0].message.content)

And here is the AI response results of sending it a screenshot from above:

Certainly! Here is the extracted text from the image:

senseesha 28m

@sps Thank you for your response! However, I want to clarify that while Base64 encoding converts images into a text format, OpenAI’s API does not currently support processing images, including those encoded in Base64, for tasks like image/symbol identification. The Base64 encoding itself is just a method of encoding the image for transmission, not a means of recognizing or processing the image content.

I hope that clarifies usage, and you can now proceed with your desired vision task!

senseesha · October 30, 2024, 1:00pm

Thanks again for the clarification!
While I understand that Base64 is a method for encoding images, my concern remains that current OpenAI’s models does not support logo / symbol identification. Given that I have millions of images and hosting them online isn’t feasible.
Is there any other possibilities available ?

arata · October 30, 2024, 1:05pm

The AI model also has been trained on millions of images.
It may do a good job of describing and identifying many, as that use-case is not prohibited or out of the question for its capabilities.

There may not be overlap in what you actually seem to describe, which might instead be an image search engine to return exact existing results.

Let me know how models like gpt-4o, gpt-4-turbo, and their snapshot versions work for you. They’ll at least be able to talk about the elements of the logo they see.

Topic		Replies	Views
Can GPT -vision models be accessed using API? API	15	1172	January 7, 2025
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3706	December 6, 2023
How to Interpret Images in OpenAI GPT-4 API with External Links? API	1	411	September 18, 2024
Moving from gpt-4-vision-preview to gpt-4o Image URL Base64 API gpt-4 , api , gpt-4-vision	2	692	September 11, 2024
OpenAI image processing capabilities (GPT-4V) availability for programmatic API use Deprecations chatgpt	2	102	January 21, 2025

OpenAI's Python SDK (v1.37.1) doesn’t support direct image input to GPT models

Related topics