Take the following example (slightly modified from the documentation, but to specify no participants):
from pydantic import BaseModel#, Field
from openai import OpenAI
client = OpenAI()
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str] #= Field(default_factory=list)
completion = client.beta.chat.completions.parse(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Extract the event information."},
{"role": "user", "content": "There are no participants going to the science fair on Friday."},
],
response_format=CalendarEvent,
)
event = completion.choices[0].message.parsed
print(event)
It can (and typically does) indeed specify an empty list for participants
. I am admittedly unfamiliar with json schemas. I noticed that if you modify the pydantic schema slightly, by uncommenting the code above, to specify that the default value should be a list, this does change the underlying json schema. (basically removing the ‘participants’ literal from the ‘required’ array)
i.e. without = Field(default_factory=list)
you get
{'properties': {'name': {'title': 'Name', 'type': 'string'},
'date': {'title': 'Date', 'type': 'string'},
'participants': {'items': {'type': 'string'},
'title': 'Participants',
'type': 'array'}},
'required': ['name', 'date', 'participants'],
'title': 'CalendarEvent',
'type': 'object'}
whereas with = Field(default_factory=list)
you get:
{'properties': {'name': {'title': 'Name', 'type': 'string'},
'date': {'title': 'Date', 'type': 'string'},
'participants': {'items': {'type': 'string'},
'title': 'Participants',
'type': 'array'}},
'required': ['name', 'date'],
'title': 'CalendarEvent',
'type': 'object'}
My questions are:
- Do these schemas get treated differently by the LLM? If so, how?
- If they do get treated differently, could this lead to a functional difference between the two schemas?