What is the best way to allow the Structured Outputs API to return empty objects?

Take the following example (slightly modified from the documentation, but to specify no participants):

from pydantic import BaseModel#, Field
from openai import OpenAI

client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str] #= Field(default_factory=list)

completion = client.beta.chat.completions.parse(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Extract the event information."},
        {"role": "user", "content": "There are no participants going to the science fair on Friday."},
    ],
    response_format=CalendarEvent,
)

event = completion.choices[0].message.parsed

print(event)

It can (and typically does) indeed specify an empty list for participants. I am admittedly unfamiliar with json schemas. I noticed that if you modify the pydantic schema slightly, by uncommenting the code above, to specify that the default value should be a list, this does change the underlying json schema. (basically removing the ‘participants’ literal from the ‘required’ array)

i.e. without = Field(default_factory=list) you get

{'properties': {'name': {'title': 'Name', 'type': 'string'},
  'date': {'title': 'Date', 'type': 'string'},
  'participants': {'items': {'type': 'string'},
   'title': 'Participants',
   'type': 'array'}},
 'required': ['name', 'date', 'participants'],
 'title': 'CalendarEvent',
 'type': 'object'}

whereas with = Field(default_factory=list) you get:

{'properties': {'name': {'title': 'Name', 'type': 'string'},
  'date': {'title': 'Date', 'type': 'string'},
  'participants': {'items': {'type': 'string'},
   'title': 'Participants',
   'type': 'array'}},
 'required': ['name', 'date'],
 'title': 'CalendarEvent',
 'type': 'object'}

My questions are:

  1. Do these schemas get treated differently by the LLM? If so, how?
  2. If they do get treated differently, could this lead to a functional difference between the two schemas?

When you put Field(default_factory=list), you are giving a default value so even if no new value is given for this field, the function can still run. Therefore, it is no longer a required parameter as far as the model is concerned. So yes the schemas get treated differently because they are different. The functional difference would likely be very minimal, but it would really depend on your use case.

I’m a bit confused. I understand that specifying Field(default_factory=list) means that, when instantiating the class in python, you don’t need to specify the corresponding attribute. i.e. this makes specifying the attribute optional.

However, drawing analogy to the Structured Outputs API, what would it mean to say that specifying the attribute is optional? The LLM would always specify an attribute for it, right? and it’s not possible to specify optional attributes in the Structured Outputs API.

Are you basically saying that the only difference is that the schema “tells” the LLM that this field has a default value somewhere in the prompt, even though it has zero impact on the deterministic schema-preserving behaviour of Structured Outputs API?