That is just a small transformation, where you send base64-encoded images in a slightly altered set of message parameters.
Let’s make functions to load and encode images, and a function for creating a user message containing multiple images if you place them in a list.
Entire code:
from pydantic import BaseModel
from typing import List, Optional
import base64
from openai import OpenAI
from PIL import Image
import os
def encode_image_to_base64(file_path):
"""
Encodes a given image file to base64.
Args:
file_path (str): Path to the image file.
Returns:
str: Base64 encoded string of the image.
"""
with open(file_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
def create_message_with_images(image_paths):
"""
Creates a message dictionary containing images encoded in base64.
Args:
image_paths (list of str): List of image file paths.
Returns:
dict: Message dictionary ready to be used with the OpenAI API.
"""
content = [
{"type": "text", "text": "Classify this image using computer vision skill."}
]
for path in image_paths:
base64_image = encode_image_to_base64(path)
content.append({
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{base64_image}"
}
})
return [{
"role": "user",
"content": content
}]
class FirstDetectionData(BaseModel):
description: str
Lithotype: List[str]
Pathology: List[str]
relative_path: Optional[str] = None
client = OpenAI()
# Example usage with image file paths provided by the user
image_paths = ['./img1.png', './img2.png']
messages = create_message_with_images(image_paths)
# Use the 'messages' list correctly with the 'parse' method
response = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=messages, # Ensure this is a list of message dictionaries
max_tokens=1000,
response_format=FirstDetectionData # Use the Pydantic model as the response format
)
# Access the structured response
structured_response = response.choices[0].message.parsed
print(structured_response)
Although it is way at the bottom, you will see the usage you need to provide is in providing a list of files, and then calling to create the user message (with preset text) that has the image attachments. All the structured formatting and parsing remains.
Understanding what’s going on instead of pasting another’s understanding will help you create your own solutions. A working demonstration that OpenAI leaves up to you is what’s missing from that understanding.
Yes, I agree OpenAI Documentation is really poorly written… I wonder why they do not create a GPT with their updated documentation instead of changing all the conventions right after their last training in October 2023. Anyway, the code works perfectly fine thank you very much