Is it possible to use structured outputs when using the vision model?
I have pictures locally stored which I want to extract information from. I need my outputs in a structured .json format, which I want to specify myself. however the vision tutorial uses URL requests to upload locally stored files.
https://platform.openai.com/docs/guides/vision
Whereas structured outputs require you to use chat completions.
https://platform.openai.com/docs/guides/structured-outputs
1 Like
Hi,
It may not be possible to use Structured Format and Vision together—in fact, I think a Structured Format Assistant or Completion can only have Functions turned on.
Anyway, I don’t know what the info in the pictures is, but you could use the multimodal 4o to extract the information, then take another Assistant and properly structure that output.
1 Like
you can use vision with structured output using chat completions. however, as of now, you cannot use vision directly with structured output in assistant api. but there is a workaround, you can delegate vision function as a tool and just pass the output to the main thread which has structured output.
1 Like