OpenAI vision with structured output when uploading local files

Is it possible to use structured outputs when using the vision model?

I have pictures locally stored which I want to extract information from. I need my outputs in a structured .json format, which I want to specify myself. however the vision tutorial uses URL requests to upload locally stored files.

https://platform.openai.com/docs/guides/vision

Whereas structured outputs require you to use chat completions.

https://platform.openai.com/docs/guides/structured-outputs

1 Like

Hi,

It may not be possible to use Structured Format and Vision together—in fact, I think a Structured Format Assistant or Completion can only have Functions turned on.

Anyway, I don’t know what the info in the pictures is, but you could use the multimodal 4o to extract the information, then take another Assistant and properly structure that output.

1 Like

you can use vision with structured output using chat completions. however, as of now, you cannot use vision directly with structured output in assistant api. but there is a workaround, you can delegate vision function as a tool and just pass the output to the main thread which has structured output.

1 Like