I want structured output from an image

I’m using the API with base 64 encoded images as suggested by the documentation https://platform.openai.com/docs/guides/vision

I would like to have a structured output according to this class

class FirstDetectionData(BaseModel):
description: str
Lithotype: List[str]
Pathology: List[str]
relative_path: Optional[str] = None

I tried to insert “response_format”:FirstDetectionData into the payload but doesn’t work. Can anybody help me?

I hope this helps.

from pydantic import BaseModel
from typing import List, Optional
from openai import OpenAI

client = OpenAI()

# Define your messages with the image
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Classify this image using computer vision skill."},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://boneclones.com/images/store-product/product-1530-main-main-big-1531762823.jpg",
                }
            },
        ]
    }
]



class FirstDetectionData(BaseModel):
    description: str
    Lithotype: List[str]
    Pathology: List[str]
    relative_path: Optional[str] = None


# Use messages
response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=messages,
    max_tokens=1000,
    response_format=FirstDetectionData  # Use the Pydantic model as the response format
)

# Access the structured response
structured_response = response.choices[0].message.parsed
print(structured_response)
1 Like

not really… I have the images in local…

That is just a small transformation, where you send base64-encoded images in a slightly altered set of message parameters.

Let’s make functions to load and encode images, and a function for creating a user message containing multiple images if you place them in a list.

Entire code:

from pydantic import BaseModel
from typing import List, Optional
import base64
from openai import OpenAI
from PIL import Image
import os

def encode_image_to_base64(file_path):
    """
    Encodes a given image file to base64.

    Args:
        file_path (str): Path to the image file.

    Returns:
        str: Base64 encoded string of the image.
    """
    with open(file_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def create_message_with_images(image_paths):
    """
    Creates a message dictionary containing images encoded in base64.

    Args:
        image_paths (list of str): List of image file paths.

    Returns:
        dict: Message dictionary ready to be used with the OpenAI API.
    """
    content = [
        {"type": "text", "text": "Classify this image using computer vision skill."}
    ]
    
    for path in image_paths:
        base64_image = encode_image_to_base64(path)
        content.append({
            "type": "image_url",
            "image_url": {
                "url": f"data:image/png;base64,{base64_image}"
            }
        })

    return [{
        "role": "user",
        "content": content
    }]

class FirstDetectionData(BaseModel):
    description: str
    Lithotype: List[str]
    Pathology: List[str]
    relative_path: Optional[str] = None

client = OpenAI()

# Example usage with image file paths provided by the user
image_paths = ['./img1.png', './img2.png']
messages = create_message_with_images(image_paths)

# Use the 'messages' list correctly with the 'parse' method
response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=messages,  # Ensure this is a list of message dictionaries
    max_tokens=1000,
    response_format=FirstDetectionData  # Use the Pydantic model as the response format
)

# Access the structured response
structured_response = response.choices[0].message.parsed
print(structured_response)

Although it is way at the bottom, you will see the usage you need to provide is in providing a list of files, and then calling to create the user message (with preset text) that has the image attachments. All the structured formatting and parsing remains.

image_paths = ['./img1.png', './img2.png']
messages = create_message_with_images(image_paths)

Understanding what’s going on instead of pasting another’s understanding will help you create your own solutions. A working demonstration that OpenAI leaves up to you is what’s missing from that understanding.

Yes, I agree OpenAI Documentation is really poorly written… I wonder why they do not create a GPT with their updated documentation instead of changing all the conventions right after their last training in October 2023. Anyway, the code works perfectly fine thank you very much

1 Like

is this the most up to date method? what models work with structured outputs of images now?

“gtp-4o-mini” and “gpt-4o-2024-08-06”… surprisingly “gpt-4o” does not work with structured output (which was one of the reasons I got stuck)