I want structured output from an image

I’m using the API with base 64 encoded images as suggested by the documentation https://platform.openai.com/docs/guides/vision

I would like to have a structured output according to this class

class FirstDetectionData(BaseModel):
description: str
Lithotype: List[str]
Pathology: List[str]
relative_path: Optional[str] = None

I tried to insert “response_format”:FirstDetectionData into the payload but doesn’t work. Can anybody help me?

I hope this helps.

from pydantic import BaseModel
from typing import List, Optional
from openai import OpenAI

client = OpenAI()

# Define your messages with the image
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Classify this image using computer vision skill."},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://boneclones.com/images/store-product/product-1530-main-main-big-1531762823.jpg",
                }
            },
        ]
    }
]



class FirstDetectionData(BaseModel):
    description: str
    Lithotype: List[str]
    Pathology: List[str]
    relative_path: Optional[str] = None


# Use messages
response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=messages,
    max_tokens=1000,
    response_format=FirstDetectionData  # Use the Pydantic model as the response format
)

# Access the structured response
structured_response = response.choices[0].message.parsed
print(structured_response)

not really… I have the images in local…

That is just a small transformation, where you send base64-encoded images in a slightly altered set of message parameters.

Let’s make functions to load and encode images, and a function for creating a user message containing multiple images if you place them in a list.

Entire code:

from pydantic import BaseModel
from typing import List, Optional
import base64
from openai import OpenAI
from PIL import Image
import os

def encode_image_to_base64(file_path):
    """
    Encodes a given image file to base64.

    Args:
        file_path (str): Path to the image file.

    Returns:
        str: Base64 encoded string of the image.
    """
    with open(file_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def create_message_with_images(image_paths):
    """
    Creates a message dictionary containing images encoded in base64.

    Args:
        image_paths (list of str): List of image file paths.

    Returns:
        dict: Message dictionary ready to be used with the OpenAI API.
    """
    content = [
        {"type": "text", "text": "Classify this image using computer vision skill."}
    ]
    
    for path in image_paths:
        base64_image = encode_image_to_base64(path)
        content.append({
            "type": "image_url",
            "image_url": {
                "url": f"data:image/png;base64,{base64_image}"
            }
        })

    return [{
        "role": "user",
        "content": content
    }]

class FirstDetectionData(BaseModel):
    description: str
    Lithotype: List[str]
    Pathology: List[str]
    relative_path: Optional[str] = None

client = OpenAI()

# Example usage with image file paths provided by the user
image_paths = ['./img1.png', './img2.png']
messages = create_message_with_images(image_paths)

# Use the 'messages' list correctly with the 'parse' method
response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=messages,  # Ensure this is a list of message dictionaries
    max_tokens=1000,
    response_format=FirstDetectionData  # Use the Pydantic model as the response format
)

# Access the structured response
structured_response = response.choices[0].message.parsed
print(structured_response)

Although it is way at the bottom, you will see the usage you need to provide is in providing a list of files, and then calling to create the user message (with preset text) that has the image attachments. All the structured formatting and parsing remains.

image_paths = ['./img1.png', './img2.png']
messages = create_message_with_images(image_paths)

Understanding what’s going on instead of pasting another’s understanding will help you create your own solutions. A working demonstration that OpenAI leaves up to you is what’s missing from that understanding.

Yes, I agree OpenAI Documentation is really poorly written… I wonder why they do not create a GPT with their updated documentation instead of changing all the conventions right after their last training in October 2023. Anyway, the code works perfectly fine thank you very much

is this the most up to date method? what models work with structured outputs of images now?

“gtp-4o-mini” and “gpt-4o-2024-08-06”… surprisingly “gpt-4o” does not work with structured output (which was one of the reasons I got stuck)