Hello everyone,
I’m having some trouble using Pydantic structured outputs in batch mode. The following is a toy example outlining my problem.
import json
from openai import OpenAI
from pydantic import BaseModel, Field
client = OpenAI()
fruits = ["apple", "banana", "orange", "strawberry"]
class Response(BaseModel):
description: str = Field(description="A short description of the fruit")
colour: str = Field(description="The fruit's colour")
tasks = []
for fruit in fruits:
task = {
"custom_id": f"task-{fruit}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-4o-mini",
"temperature": 0.1,
"messages": [
{
"role": "system",
"content": "I will give you a fruit, you will provide the information outlined in the structured output"
},
{
"role": "user",
"content": fruit
}
],
"response_format": Response
}
}
tasks.append(task)
# Creating and uploading the file
file_name = "test/batch_tasks_fruit.jsonl"
with open(file_name, 'w') as file:
for obj in tasks:
file.write(json.dumps(obj) + '\n')
batch_file = client.files.create(
file=open(file_name, "rb"),
purpose="batch"
)
print(batch_file)
# Creating the batch job
batch_job = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
This code throws a TypeError: Object of type ModelMetaclass is not JSON serializable
. After looking around for a bit I found a helpful answer in a post on this forum:
Which suggests converting the model to JSON first using Pydantic’s model_json_schema()
method.
The batches created with:
....
}
],
"response_format": Response.model_json_schema()
}
....
…fail however. The error for each task is Invalid value: 'object'. Supported values are: 'json_object', 'json_schema', and 'text'.
.
My question then is: how can I use Pydantic classes for structured outputs in batch mode? Is it even supported as of right now? Furthermore, I would rather avoid having to manually convert the Pydantic classes to a JSON schema. This is due to the fact that in my actual project I am using nested classes. I have noticed that converting nested Pydantic classes to json has weird effects on the resulting json schema: Classes are referenced instead of directly inserted at the point where they should be.