Pydantic response model failure

Hello,

I am using instructor and pydantic to specify a schema to an open AI chat completion call. I had a schema that was working perfectly fine yesterday, but now faces some problems with:

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for function 'BalanceSheetExtract'. Please ensure it is a valid JSON Schema.", 'type': 'invalid_request_error', 'param': 'tools[0].function.parameters', 'code': 'invalid_function_parameters'}}

My Schema is of the form:

class BalanceSheetExtract(BaseModel):
    company_name: str = Field(default=None, validate_default=True)
    prev_year: int = Field(default=None, validate_default=True)
    current_year: int = Field(default=None, validate_default=True)
    liabilities: LiabilitiesExtract
    assets: AssetsExtract
    equity: EquityExtract

class AssetsExtract(BaseModel):
    current_assets: CurrentAssetsExtract
    non_current_assets: NonCurrentAssetsExtract
    total_assets: DataField

class NonCurrentAssetsExtract(BaseModel):
    fields: List[DataField] = Field(default=None, validate_default=True)
    total_non_current_assets: DataField = Field(default=None, validate_default=True)

etc, the other sub classes are of a similar format as above

Where a data field is:

class DataField(BaseModel):
    field_name: str = Field(default=None, validate_default=True)
    prev_year_value: float = Field(default=0, validate_default=True)
    current_year_value: float = Field(default=0, validate_default=True)

This structure has worked fine for me yesterday. It works however when I remove the fields field from NonCurrentAssetsExtract hinting that they may be some limit to how deep a nested object can be maybe? Any thoughts?

The request is like this:

response = self.client.chat.completions.create(
            model="gpt-4o-2024-05-13",
            response_model=response_model,
            max_tokens=4096,
            max_retries=5,
            messages=[
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": prompt},
                        *[
                            {
                                "type": "image_url",
                                "image_url": {"url": f"data:image/png;base64,{image}"},
                            }
                            for image in base64_images
                        ],
                    ],
                }
            ],
        )
2 Likes

I have not updated instructor or pydantic recently, so there should be no real change in the request I am sending over.

1 Like

Is the input too large for your max tokens maybe?

Hmm unfortunately no. Even for small messages. If I make my response model too ā€œdeepā€ it seems to fail. Oddly, this is a very recent occurrence as it was working fine yesterday

1 Like

Have you verified the specific json output in something like jsonlint.com?

1 Like

Running into the same issue. If I make my response model too deep then it seems to fail. It seems like it can only go up to 4 levels.

1 Like

Running into the same issue.

1 Like

Would need to look at what version of snake this is. There have been several changes in each iteration i.e. 3.10, 3.11, 3.12. See also

How a class is defined and initialized can cause errors later down the road.
Those are called ā€˜hairpullersā€™ by most snek charmers.

I reproduced your schema here in our prompt playground ā€“ seems to work fine for gpt4o, feel free to modify this or let me know if you have questions:

I canā€™t post a link but you can paste this into promptfiddle dot com

class BalanceSheetExtract {
  company_name string
  prev_year int
  current_year int
  assets Assets
  // liabilities LiabilitiesExtract
}

class Assets {
  // current_assets CurrentAssetsExtract
  non_current_assets NonCurrentAssets
}

class NonCurrentAssets {
  fields DataField[]
  total_non_current_assets DataField
}

class DataField {
  field_name string
  prev_year_value int
  current_year_value int
}

client<llm> GPT4o {
  provider openai
  options {
    model gpt-4o
    api_key env.OPENAI_API_KEY
  }
} 

function ExtractBalanceSheet(myImage: image) -> BalanceSheetExtract {
  client GPT4o

  prompt #"
    {{ _.role("user") }}
    Extract the following schema from the image provided

    {{ ctx.output_format }}

    {{ myImage }}

    JSON:
  "#
}

test Test1 {
  functions [ExtractBalanceSheet]
  args {
    myImage {
      url "the-url"
    }
  }
}

you can still call this in python like this:
await baml.ExtractBalanceSheet(....)

2 Likes

The issue was never the model. It was the snake. it can bite.
Glad to hear that fixed it. Have a good one.
And remember, you only have so much hair. Be kind to it. :wink:

I just tested it and we still encounter the same or a similar issue.

The first example is a request to the chat completion API where it accepts the JSON schema:

curl --request POST \
  --url ā€¦/v1/chat/completions \
  --header 'Authorization: Bearer XXX' \
  --header 'Content-Type: application/json' \
  --data '{
	"model": "gpt-4-turbo",
	"messages": [
		{
			"role": "user",
			"content": "Do something!"
		}
	],
	"tools": [
		{
			"type": "function",
			"function": {
				"name": "set_response",
				"description": "Set the response.",
				"parameters": {
					"type": "object",
					"title": "response",
					"$defs": {
						"Model3": {
							"type": "object",
							"properties": {
								"property1": {
									"type": "string",
									"description": "The first property."
								}
							},
							"required": [
								"property1",
								"propertyWithRef"
							]
						},
						"Model2": {
							"type": "object",
							"properties": {
								"property1": {
									"type": "string",
									"description": "The first property."
								},
								"propertyWithRef": {
									"$ref": "#/definitions/Model3"
								}
							},
							"required": [
								"property1"
							]
						},
						"Model1": {
							"type": "object",
							"properties": {
								"property1": {
									"type": "string",
									"description": "The first property."
								},
								"propertyWithRef": {
									"type": "array",
									"items": {
										"$ref": "#/definitions/Model2"
									}
								}
							},
							"required": [
								"property1",
								"propertyWithRef"
							]
						}
					},
					"properties": {
						"property1": {
							"type": "string",
							"description": "The first property."
						},
						"propertyWithRef": {
							"type": "array",
							"items": {
								"$ref": "#/definitions/Model1"
							}
						}
					},
					"required": [
						"property1"
					]
				}
			}
		}
	],
	"tool_choice": "auto"
}'

As you see the JSON schema uses refs up to a depth of 3.

But whenever I chamne one more layer to use the ref in an array schema definition , the schema isnā€™t accepted anymore.

curl --request POST \
  --url ā€¦/v1/chat/completions \
  --header 'Authorization: Bearer XXX' \
  --header 'Content-Type: application/json' \
  --data '{
	"model": "gpt-4-turbo",
	"messages": [
		{
			"role": "user",
			"content": "Do something!"
		}
	],
	"tools": [
		{
			"type": "function",
			"function": {
				"name": "set_response",
				"description": "Set the response.",
				"parameters": {
					"type": "object",
					"title": "response",
					"$defs": {
						"Model3": {
							"type": "object",
							"properties": {
								"property1": {
									"type": "string",
									"description": "The first property."
								}
							},
							"required": [
								"property1",
								"propertyWithRef"
							]
						},
						"Model2": {
							"type": "object",
							"properties": {
								"property1": {
									"type": "string",
									"description": "The first property."
								},
								"propertyWithRef": {
									"type": "array",
									"items": {
										"$ref": "#/definitions/Model3"
									}
								}
							},
							"required": [
								"property1"
							]
						},
						"Model1": {
							"type": "object",
							"properties": {
								"property1": {
									"type": "string",
									"description": "The first property."
								},
								"propertyWithRef": {
									"type": "array",
									"items": {
										"$ref": "#/definitions/Model2"
									}
								}
							},
							"required": [
								"property1",
								"propertyWithRef"
							]
						}
					},
					"properties": {
						"property1": {
							"type": "string",
							"description": "The first property."
						},
						"propertyWithRef": {
							"type": "array",
							"items": {
								"$ref": "#/definitions/Model1"
							}
						}
					},
					"required": [
						"property1"
					]
				}
			}
		}
	],
	"tool_choice": "auto"
}'

Response with

{
	"error": {
		"message": "Invalid schema for function 'set_response'. Please ensure it is a valid JSON Schema.",
		"type": "invalid_request_error",
		"param": "tools[0].function.parameters",
		"code": "invalid_function_parameters"
	}
}

We would really appreciate if someone could look into this issue soon.

  • Sorry for not including the full URL in the curls. This forum doesnā€™t want me to post linksā€¦
1 Like

I can reproduce this.
I think itā€™s worthy to mention that both request bodies are valid JSON schemata even though the invalid_request_error is only shown for the second example containing refs with a depth greater than 3.

have you tried using a different schema format? I find that if we write things like:

{
   param1: string
   param2: {
       nestedParam1: string
   }
}

as if they were type interfaces works better for us

Not sure what you mean. As far as I know we have to provide a valid JSON schema. Also our schema is created by Pydantic. So there is not much we can do besides restructuring the whole schema and writing it on our own.

But since it stopped working at some point, seems to be an issue in the API, and I would expect it to be fixed at some point.

My current workaround is, to not use functions and pass the schema in the system message. Which works fine for our use case.

weā€™re writing a small library to convert a json schema (from a pydantic model) into a type interface as I mentioned above.

Do you call the openai APIs using the python client or use another library like instructor? We should have it ready in around a week if youā€™re interested.

We use OpenAIā€™s Python client. However, as you can see in my examples above, the issue must be on OpenAIā€™s side since it also doesnā€™t accept the JSON schema when we send a request using curl.

I already have a workaround, but there is still an issue with the API.

This might not be the right place to post issues that canā€™t be solved by the community.

I see, GPT says there are some issues with the actual schema: ChatGPT , might want to take a look. There are various versions of JSON schema