Response_format and Fields with Pydantic

hi – does anyone know where to find ALL of the parameters for response_format in JSON mode – is there a way to enforce requirements on the output similar to Instructor?

Hi @dmc1 and welcome to the forums!

I haven’t used Instructor (have heard of it; similar to Outlines). But in general setting strict to True yields strict enforcement of the schema.

Regarding what parameters are supported, OpenAI official documentation is quite comprehensive, but to provide some references:

Also note that there are some intricacies when using JSON definition for your response_format. For example, you have to provide "additionalProperties": false for every object.

So I tend to define my schema in Pydantic and just pass the Pydantic class as my response_format.

3 Likes

Thanks! but for here - https://openai.com/index/introducing-structured-outputs-in-the-api/ - for the UI output, how is the JSON output structured like this? is it the assistant specifically or is it not clear from the example? output in question.

{
“type”: “div”,
“label”: “”,
“children”: [
{
“type”: “header”,
“label”: “”,
“children”: [
{
“type”: “div”,
“label”: “Green Thumb Gardening”,
“children”: ,
“attributes”: [{ “name”: “className”, “value”: “site-title” }]
},
{
“type”: “div”,
“label”: “Bringing Life to Your Garden”,
“children”: ,
“attributes”: [{ “name”: “className”, “value”: “site-tagline” }]
}
],
“attributes”: [{ “name”: “className”, “value”: “header” }]
},
{
“type”: “section”,
“label”: “”,
“children”: [
{
“type”: “div”,
“label”: “”,
“children”: [
{
“type”: “div”,
“label”: “About Us”,
“children”: [
{
“type”: “div”,
“label”: “At Green Thumb Gardening, we specialize in transforming your outdoor spaces into beautiful, thriving gardens. Our team has decades of experience in horticulture and landscape design.”,
“children”: ,
“attributes”: [
{ “name”: “className”, “value”: “about-description” }
]
}
],
“attributes”: [{ “name”: “className”, “value”: “about-section” }]
}
],
“attributes”: [{ “name”: “className”, “value”: “content” }]
}
],
“attributes”: [{ “name”: “className”, “value”: “about-container” }]
},
{
“type”: “section”,
“label”: “”,
“children”: [
{
“type”: “div”,
“label”: “”,
“children”: [
{
“type”: “div”,
“label”: “Our Services”,
“children”: [
{
“type”: “div”,
“label”: “Garden Design”,
“children”: ,
“attributes”: [
{ “name”: “className”, “value”: “service-item” }
]
},
{
“type”: “div”,
“label”: “Plant Care & Maintenance”,
“children”: ,
“attributes”: [
{ “name”: “className”, “value”: “service-item” }
]
},
{
“type”: “div”,
“label”: “Seasonal Cleanup”,
“children”: ,
“attributes”: [
{ “name”: “className”, “value”: “service-item” }
]
},
{
“type”: “div”,
“label”: “Custom Landscaping”,
“children”: ,
“attributes”: [
{ “name”: “className”, “value”: “service-item” }
]
}
],
“attributes”: [{ “name”: “className”, “value”: “services-list” }]
}
],
“attributes”: [{ “name”: “className”, “value”: “content” }]
}
],
“attributes”: [{ “name”: “className”, “value”: “services-container” }]
}
],
“attributes”: [{ “name”: “className”, “value”: “landing-page” }]
}

Hi @platypus !

When yo use Pydantic for response format, do yo provide the description of the fields to be returned inside the prompt or inside the description attribute of each field? Which approach works best?

When you use the Python SDK and optionally use client.beta.chat.completions.parse() as the method for sending, the best way to have each field filled properly is indeed by describing within the schema.

Here, for example, multiple parts of response production are described to the AI, for it to refer to directly when using such a schema:

class JsonResponse(BaseModel):
    chat_topic: str = Field(
        ...,
        description="The specialization or field that this encompasses."
    )
    chat_title: str = Field(
        ...,
        description="A display title for the chat, 4-6 words in length. Can remain consistent or change based on new direction."
    )
    user_question_rephrase: str = Field(
        ...,
        description="Briefly restates the user's actual task or request in their voice, inferred from the latest input and context, but made standalone."
    )
    chain_of_thought: str = Field(
        ...,
        description="Detailed internal reasoning and planning steps working towards a solution that can finally be presented."
    )
    response_to_user: str = Field(
        ...,
        description="The final output presented to the user, fulfilling their need, request, or task."
    )
    model_config = ConfigDict(extra='forbid')

If the AI is directly sending to an API, is doing a particular task, is not interacting with a user, it is best to make that a system role message telling the AI why it is only writing JSON and where the destination is being employed, so that it does the job it is assigned well.

1 Like

Yes, I switched to using field descriptions and it seems to work well. In addition, setting temperature to 0.0 and top p to 1.0 really improves extraction quality.