OpenAI vision with structured output when uploading local files

tamergezici · August 27, 2024, 1:32pm

Is it possible to use structured outputs when using the vision model?

I have pictures locally stored which I want to extract information from. I need my outputs in a structured .json format, which I want to specify myself. however the vision tutorial uses URL requests to upload locally stored files.

https://platform.openai.com/docs/guides/vision

Whereas structured outputs require you to use chat completions.

https://platform.openai.com/docs/guides/structured-outputs

thinktank · August 27, 2024, 4:49pm

Hi,

It may not be possible to use Structured Format and Vision together—in fact, I think a Structured Format Assistant or Completion can only have Functions turned on.

Anyway, I don’t know what the info in the pictures is, but you could use the multimodal 4o to extract the information, then take another Assistant and properly structure that output.

supershaneski · August 27, 2024, 11:23pm

you can use vision with structured output using chat completions. however, as of now, you cannot use vision directly with structured output in assistant api. but there is a workaround, you can delegate vision function as a tool and just pass the output to the main thread which has structured output.

dmc1 · September 17, 2024, 8:05pm

i don’t know if that’s true – i have this working as an example.

from openai import OpenAI
from pprint import pprint
from pydantic import BaseModel, Field

client = OpenAI()

class Image(BaseModel):
    description: str
    topic: str = Field(description='the single topic of the image')



response = client.beta.chat.completions.parse(
  model="gpt-4o-2024-08-06",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  response_format=Image,
  max_tokens=300,
)

Topic		Replies	Views
Unable to directly analyze or view the content of files like (local) images API chat-completion , gpt-4-vision	3	223	November 7, 2024
How to load a local image to gpt4 -vision using API API gpt-4-vision	4	39656	February 27, 2024
I want structured output from an image API api , structured-output	6	557	September 18, 2024
Can Assistants API understand image files uploaded? API	11	10095	September 28, 2024
Using an image directly from the PC (or any device) instead of image_url in chat completion API API chatgpt , api , image-reading , chat-with-images	2	3912	August 26, 2024

OpenAI vision with structured output when uploading local files

Related topics