What is the best way to allow the Structured Outputs API to return empty objects?

sebastian.chejniak · October 3, 2024, 4:13pm

Take the following example (slightly modified from the documentation, but to specify no participants):

from pydantic import BaseModel#, Field
from openai import OpenAI

client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str] #= Field(default_factory=list)

completion = client.beta.chat.completions.parse(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Extract the event information."},
        {"role": "user", "content": "There are no participants going to the science fair on Friday."},
    ],
    response_format=CalendarEvent,
)

event = completion.choices[0].message.parsed

print(event)

It can (and typically does) indeed specify an empty list for participants. I am admittedly unfamiliar with json schemas. I noticed that if you modify the pydantic schema slightly, by uncommenting the code above, to specify that the default value should be a list, this does change the underlying json schema. (basically removing the ‘participants’ literal from the ‘required’ array)

i.e. without = Field(default_factory=list) you get

{'properties': {'name': {'title': 'Name', 'type': 'string'},
  'date': {'title': 'Date', 'type': 'string'},
  'participants': {'items': {'type': 'string'},
   'title': 'Participants',
   'type': 'array'}},
 'required': ['name', 'date', 'participants'],
 'title': 'CalendarEvent',
 'type': 'object'}

whereas with = Field(default_factory=list) you get:

{'properties': {'name': {'title': 'Name', 'type': 'string'},
  'date': {'title': 'Date', 'type': 'string'},
  'participants': {'items': {'type': 'string'},
   'title': 'Participants',
   'type': 'array'}},
 'required': ['name', 'date'],
 'title': 'CalendarEvent',
 'type': 'object'}

My questions are:

Do these schemas get treated differently by the LLM? If so, how?
If they do get treated differently, could this lead to a functional difference between the two schemas?

tanm1 · October 3, 2024, 5:24pm

When you put Field(default_factory=list), you are giving a default value so even if no new value is given for this field, the function can still run. Therefore, it is no longer a required parameter as far as the model is concerned. So yes the schemas get treated differently because they are different. The functional difference would likely be very minimal, but it would really depend on your use case.

sebastian.chejniak · October 3, 2024, 6:03pm

I’m a bit confused. I understand that specifying Field(default_factory=list) means that, when instantiating the class in python, you don’t need to specify the corresponding attribute. i.e. this makes specifying the attribute optional.

However, drawing analogy to the Structured Outputs API, what would it mean to say that specifying the attribute is optional? The LLM would always specify an attribute for it, right? and it’s not possible to specify optional attributes in the Structured Outputs API.

Are you basically saying that the only difference is that the schema “tells” the LLM that this field has a default value somewhere in the prompt, even though it has zero impact on the deterministic schema-preserving behaviour of Structured Outputs API?

Topic		Replies	Views
Best strategy for required vs optional parameters in calling functions? API gpt-4 , functions	4	4571	June 26, 2024
Clarity on "Optional" Parameters in Structured Outputs API	3	1441	March 24, 2025
Skip Nullable Fields When Using Function Calling For Text Extraction API api	1	1157	October 3, 2023
Assign null when information not available in structured outputs API api , structured-output	6	1737	March 30, 2025
Difference between Structured Outputs and function calling required API structured-output	9	3387	September 13, 2024

What is the best way to allow the Structured Outputs API to return empty objects?

Related topics