OpenAI’s Python SDK (v1.37.1) doesn’t support direct image input to GPT models, including gpt-4-turbo or GPT4.0-mini. The API only accepts text inputs, so image-based tasks, like OCR or image detection is not being possible.
Are there any possibilities to address this challenge ?
@sps Thank you for your response!
However, I want to clarify that while Base64 encoding converts images into a text format, OpenAI’s API does not currently support processing images, including those encoded in Base64, for tasks like image/symbol identification.
The Base64 encoding itself is just a method of encoding the image for transmission, not a means of recognizing or processing the image content.
Yes, the image has to be sent to the model in order for it to “see” the image. Base64 is a way of encoding the image to prepare it for transmission to the model so it can be processed. The other option is to send the URL of the image where it’s hosted on the web.
The images are sent as image content blocks, which is different from text. I recommend running the previously shared boilerplate code from the docs.
You can also quickly experiment using Playground to test Vision capabilities.
I want to clarify that OpenAI’s API and AI models have support for processing images - viewing them with GPT-4 vision.
Here is example code with correct usage for OpenAI’s Python SDK.
import base64
import openai
# Read the image file and encode it in base64
with open("image.jpg", "rb") as image_file:
base64_encoded_image = base64.b64encode(image_file.read()).decode("utf-8")
# Prepare the message with the prompt and image
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Analyze this image using built in vision, extracting text.",
},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{base64_encoded_image}"},
},
],
}
]
# Set up the parameters for the API request
parameters = {
"model": "gpt-4o",
"messages": messages,
}
# Initialize the OpenAI client
client = openai.Client()
# Send the request and receive the response
response = client.chat.completions.create(**parameters)
print(response.choices[0].message.content)
And here is the AI response results of sending it a screenshot from above:
Certainly! Here is the extracted text from the image:
senseesha 28m
@sps Thank you for your response! However, I want to clarify that while Base64 encoding converts images into a text format, OpenAI’s API does not currently support processing images, including those encoded in Base64, for tasks like image/symbol identification. The Base64 encoding itself is just a method of encoding the image for transmission, not a means of recognizing or processing the image content.
I hope that clarifies usage, and you can now proceed with your desired vision task!
Thanks again for the clarification!
While I understand that Base64 is a method for encoding images, my concern remains that current OpenAI’s models does not support logo / symbol identification. Given that I have millions of images and hosting them online isn’t feasible.
Is there any other possibilities available ?
The AI model also has been trained on millions of images.
It may do a good job of describing and identifying many, as that use-case is not prohibited or out of the question for its capabilities.
There may not be overlap in what you actually seem to describe, which might instead be an image search engine to return exact existing results.
Let me know how models like gpt-4o, gpt-4-turbo, and their snapshot versions work for you. They’ll at least be able to talk about the elements of the logo they see.