Here’s the deal: If you have a schema placed, but it is not “strict” and is not constructed compliant to the strict specification, the only thing enforcing the AI’s output is the AI model’s own understanding of the text it received as “schema”.
Additionally, Pydantic doesn’t make fields required, doesn’t block additionalProperties, and also makes liberal use of references and definitions. When using the SDK chat completions parse()
method, all those are added for you in addition to the schema being forced “strict”, but here we need to make an easily-understood schema, that is enhanced in quality, also.
The first thing - we want a flat schema, probably even if reusing $def items. A helper function will do that with the supported Pydantic version’s JSON schema output:
from openai import OpenAI
from pydantic import BaseModel, Field
# from typing import List # if not using built-in types
def dereference_schema(schema: dict) -> dict:
defs = schema.pop("$defs", {})
def _resolve_refs(obj):
if isinstance(obj, dict):
if "$ref" in obj:
ref_path = obj.pop("$ref")
ref_name = ref_path.split("/")[-1]
ref_schema = defs.get(ref_name)
if ref_schema is None:
raise ValueError(f"Reference {ref_name} not found in definitions.")
# Recursively resolve nested refs
resolved_schema = _resolve_refs(ref_schema.copy())
obj.update(resolved_schema)
else:
for key, value in obj.items():
obj[key] = _resolve_refs(value)
elif isinstance(obj, list):
obj = [_resolve_refs(item) for item in obj]
return obj
return _resolve_refs(schema)
Then let’s really enhance that class schema for AI understanding:
- A useful name (when placed after “
# Responses
” in internal AI context)
- A useful title (the main class name)
- A useful description field for the AI to read
- Setting all fields in a required
- Disallow more fields or those placed in “additionalProperties”
class ArticleSummary(BaseModel):
kundenname: str = Field(..., description="Name of the customer")
artikelname: str = Field(..., description="Name of the article")
pznNr: str = Field(..., description="PZN number of the article")
anzProdukte: str = Field(..., description="Number of products")
bestellnummer: str = Field(..., description="Order number")
class Config:
extra = "forbid" # ensures additionalProperties: false
class JSONListOfEachArticle(BaseModel):
messages: list[ArticleSummary] = Field(...,
description="Array list of article summary objects, one for every item")
class Config:
extra = "forbid" # ensures additionalProperties: false
You’ll see I even talked about “objects” and what goes in them. You can enhance the descriptions even further.
Now build the metadata for the response format. Everything is also prepared for this to be strict
now, and it will be accepted even when creating an assistant along with file_search
, but I expect with a vector store attached you’ll get a big 500 error.
response_schema = dereference_schema(JSONListOfEachArticle.model_json_schema())
response_format={
'type': 'json_schema',
'json_schema':
{
"name":"JSON_article_output",
#"strict": True, # was possible only when not using internal tools
"schema": response_schema
}
}
Now you are ready to ask about your “articles” (items for sale?) with the schema now sent and here actually as returned in the assistant object:
"object": "assistant",
"tools": [
{
"type": "file_search",
"file_search": {
"max_num_results": null,
"ranking_options": {
"score_threshold": 0.0,
"ranker": "default_2024_08_21"
}
}
}
],
"response_format": {
"json_schema": {
"name": "JSON_article_output",
"description": null,
"schema_": {
"additionalProperties": false,
"properties": {
"messages": {
"description": "List of article summary objects, one for every item",
"items": {
"additionalProperties": false,
"properties": {
"kundenname": {
"description": "Name of the customer",
"title": "Kundenname",
"type": "string"
},
"artikelname": {
"description": "Name of the article",
"title": "Artikelname",
"type": "string"
},
"pznNr": {
"description": "PZN number of the article",
"title": "Pznnr",
"type": "string"
},
"anzProdukte": {
"description": "Number of products",
"title": "Anzprodukte",
"type": "string"
},
"bestellnummer": {
"description": "Order number",
"title": "Bestellnummer",
"type": "string"
}
},
"required": [
"kundenname",
"artikelname",
"pznNr",
"anzProdukte",
"bestellnummer"
],
"title": "ArticleSummary",
"type": "object"
},
"title": "Messages",
"type": "array"
}
},
"required": [
"messages"
],
"title": "JSONListOfEachArticle",
"type": "object"
},
"strict": false
},
"type": "json_schema"
},