OpenAI vision with structured output when uploading local files

Is it possible to use structured outputs when using the vision model?

I have pictures locally stored which I want to extract information from. I need my outputs in a structured .json format, which I want to specify myself. however the vision tutorial uses URL requests to upload locally stored files.

https://platform.openai.com/docs/guides/vision

Whereas structured outputs require you to use chat completions.

https://platform.openai.com/docs/guides/structured-outputs

1 Like

Hi,

It may not be possible to use Structured Format and Vision together—in fact, I think a Structured Format Assistant or Completion can only have Functions turned on.

Anyway, I don’t know what the info in the pictures is, but you could use the multimodal 4o to extract the information, then take another Assistant and properly structure that output.

1 Like

you can use vision with structured output using chat completions. however, as of now, you cannot use vision directly with structured output in assistant api. but there is a workaround, you can delegate vision function as a tool and just pass the output to the main thread which has structured output.

1 Like

i don’t know if that’s true – i have this working as an example.

from openai import OpenAI
from pprint import pprint
from pydantic import BaseModel, Field

client = OpenAI()

class Image(BaseModel):
    description: str
    topic: str = Field(description='the single topic of the image')



response = client.beta.chat.completions.parse(
  model="gpt-4o-2024-08-06",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  response_format=Image,
  max_tokens=300,
)
1 Like