Hi!
So I tried to emulate a two-level categorization with Structured Outputs - I hope this is helpful to you.
For this example, I emulated Amazon-like two-level product categorization based on a product description. So for some description, you could get e.g. Primary Category == “Electronics”, and Secondary Category == “Mobile Phones”. You can be even more strict with these categorizations by specifying them in the enum
field.
This is how I constructed the JSON schema:
json_schema = {
"name": "Categorization",
"schema": {
"type": "object",
"properties": {
"results": {
"type": "array",
"description": "List of two-level product categorization results",
"items": {
"type": "object",
"properties": {
"product_id": {
"type": "string"
},
"primary_category": {
"type": "string"
},
"secondary_category": {
"type": "string"
}
},
"required": ["product_id", "primary_category", "secondary_category"],
"additionalProperties": False
}
}
},
"required": ["results"],
"additionalProperties": False
},
"strict": True
}
This is then the sample data I used:
product_descriptions = """
Product ID: “P123456789”, Product Description: “Apple iPhone 14 Pro Max, 256GB, Space Gray”
Product ID: “P987654321”, Product Description: “Samsung 55-Inch QLED Smart TV - QN55Q60AAFXZA”
Product ID: “P112233445”, Product Description: “Instant Pot Duo 7-in-1 Electric Pressure Cooker, 6 Quart”
Product ID: “P998877665”, Product Description: “Nike Air Max 270 Men’s Running Shoes, Black/White”
Product ID: “P334455667”, Product Description: “LEGO Star Wars: The Mandalorian The Razor Crest 75292”
Product ID: “P776655443”, Product Description: “Sony WH-1000XM4 Wireless Noise-Canceling Headphones, Black”
Product ID: “P554433221”, Product Description: “Dyson V11 Animal Cordless Vacuum Cleaner”, Primary Category: “Home & Kitchen”
Product ID: “P223344556”, Product Description: “Patagonia Men’s Better Sweater Fleece Jacket, Navy”, Primary Category: “Mens Clothing”
Product ID: “P665544332”, Product Description: “Canon EOS R6 Mirrorless Camera with RF 24-105mm Lens”, Primary Category: “Electronics”
Product ID: “P443322110”, Product Description: “Microsoft Surface Pro 7 - 12.3-inch Touch-Screen - Intel Core i5 - 8GB Memory - 256GB SSD - Platinum”
"""
And finally, this is how I made the call to the API:
response = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{
"role": "system",
"content": """You are a product categorization assistant.
Your job is to perform a two-level product categorization (primary and secondary categories) based on the provided product descriptions.
For example, a product may be "Electronics" in primary category, and "Mobile Phones" in the secondary category.
You are to return the categorizations, along with the associated product ID, as per the enclosed scheme.""",
},
{
"role": "user",
"content": product_descriptions
}
],
response_format={
"type": "json_schema",
"json_schema": json_schema,
}
)
This results in the following:
pprint.pprint(json.loads(response.choices[0].message.content), indent=4)
{ 'results': [ { 'primary_category': 'Electronics',
'product_id': 'P123456789',
'secondary_category': 'Mobile Phones'},
{ 'primary_category': 'Electronics',
'product_id': 'P987654321',
'secondary_category': 'Televisions'},
{ 'primary_category': 'Home & Kitchen',
'product_id': 'P112233445',
'secondary_category': 'Kitchen Appliances'},
{ 'primary_category': 'Footwear',
'product_id': 'P998877665',
'secondary_category': "Men's Shoes"},
{ 'primary_category': 'Toys & Games',
'product_id': 'P334455667',
'secondary_category': 'Building Sets'},
{ 'primary_category': 'Electronics',
'product_id': 'P776655443',
'secondary_category': 'Audio Equipment'},
{ 'primary_category': 'Home & Kitchen',
'product_id': 'P554433221',
'secondary_category': 'Vacuum Cleaners'},
{ 'primary_category': 'Mens Clothing',
'product_id': 'P223344556',
'secondary_category': 'Jackets & Coats'},
{ 'primary_category': 'Electronics',
'product_id': 'P665544332',
'secondary_category': 'Cameras'},
{ 'primary_category': 'Computers & Tablets',
'product_id': 'P443322110',
'secondary_category': 'Tablets'}]}
You can try to play around with my json_schema
above and tweak it to your need. I hope this helps you!