Pydantic response model failure

developers4 · May 31, 2024, 10:32am

Hello,

I am using instructor and pydantic to specify a schema to an open AI chat completion call. I had a schema that was working perfectly fine yesterday, but now faces some problems with:

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for function 'BalanceSheetExtract'. Please ensure it is a valid JSON Schema.", 'type': 'invalid_request_error', 'param': 'tools[0].function.parameters', 'code': 'invalid_function_parameters'}}

My Schema is of the form:

class BalanceSheetExtract(BaseModel):
    company_name: str = Field(default=None, validate_default=True)
    prev_year: int = Field(default=None, validate_default=True)
    current_year: int = Field(default=None, validate_default=True)
    liabilities: LiabilitiesExtract
    assets: AssetsExtract
    equity: EquityExtract

class AssetsExtract(BaseModel):
    current_assets: CurrentAssetsExtract
    non_current_assets: NonCurrentAssetsExtract
    total_assets: DataField

class NonCurrentAssetsExtract(BaseModel):
    fields: List[DataField] = Field(default=None, validate_default=True)
    total_non_current_assets: DataField = Field(default=None, validate_default=True)

etc, the other sub classes are of a similar format as above

Where a data field is:

class DataField(BaseModel):
    field_name: str = Field(default=None, validate_default=True)
    prev_year_value: float = Field(default=0, validate_default=True)
    current_year_value: float = Field(default=0, validate_default=True)

This structure has worked fine for me yesterday. It works however when I remove the fields field from NonCurrentAssetsExtract hinting that they may be some limit to how deep a nested object can be maybe? Any thoughts?

The request is like this:

response = self.client.chat.completions.create(
            model="gpt-4o-2024-05-13",
            response_model=response_model,
            max_tokens=4096,
            max_retries=5,
            messages=[
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": prompt},
                        *[
                            {
                                "type": "image_url",
                                "image_url": {"url": f"data:image/png;base64,{image}"},
                            }
                            for image in base64_images
                        ],
                    ],
                }
            ],
        )

developers4 · May 31, 2024, 10:39am

I have not updated instructor or pydantic recently, so there should be no real change in the request I am sending over.

jlvanhulst · May 31, 2024, 11:54am

Is the input too large for your max tokens maybe?

developers4 · May 31, 2024, 12:38pm

Hmm unfortunately no. Even for small messages. If I make my response model too “deep” it seems to fail. Oddly, this is a very recent occurrence as it was working fine yesterday

jlvanhulst · May 31, 2024, 12:57pm

Have you verified the specific json output in something like jsonlint.com?

krishnagupta21 · June 4, 2024, 4:24pm

Running into the same issue. If I make my response model too deep then it seems to fail. It seems like it can only go up to 4 levels.

stefan-neo · June 14, 2024, 9:02am

Running into the same issue.

ra_0929 · June 14, 2024, 5:34pm

Would need to look at what version of snake this is. There have been several changes in each iteration i.e. 3.10, 3.11, 3.12. See also

github.com

python/cpython/blob/3.12/Lib/dataclasses.py

import re
import sys
import copy
import types
import inspect
import keyword
import functools
import itertools
import abc
import _thread
from types import FunctionType, GenericAlias


__all__ = ['dataclass',
           'field',
           'Field',
           'FrozenInstanceError',
           'InitVar',
           'KW_ONLY',
           'MISSING',

This file has been truncated. show original

How a class is defined and initialized can cause errors later down the road.
Those are called ‘hairpullers’ by most snek charmers.

aaronv · June 14, 2024, 6:25pm

I reproduced your schema here in our prompt playground – seems to work fine for gpt4o, feel free to modify this or let me know if you have questions:

I can’t post a link but you can paste this into promptfiddle dot com

class BalanceSheetExtract {
  company_name string
  prev_year int
  current_year int
  assets Assets
  // liabilities LiabilitiesExtract
}

class Assets {
  // current_assets CurrentAssetsExtract
  non_current_assets NonCurrentAssets
}

class NonCurrentAssets {
  fields DataField[]
  total_non_current_assets DataField
}

class DataField {
  field_name string
  prev_year_value int
  current_year_value int
}

client<llm> GPT4o {
  provider openai
  options {
    model gpt-4o
    api_key env.OPENAI_API_KEY
  }
} 

function ExtractBalanceSheet(myImage: image) -> BalanceSheetExtract {
  client GPT4o

  prompt #"
    {{ _.role("user") }}
    Extract the following schema from the image provided

    {{ ctx.output_format }}

    {{ myImage }}

    JSON:
  "#
}

test Test1 {
  functions [ExtractBalanceSheet]
  args {
    myImage {
      url "the-url"
    }
  }
}

you can still call this in python like this:
await baml.ExtractBalanceSheet(....)

ra_0929 · June 14, 2024, 6:28pm

The issue was never the model. It was the snake. it can bite.
Glad to hear that fixed it. Have a good one.
And remember, you only have so much hair. Be kind to it.

ranzwertig · June 16, 2024, 9:55pm

I just tested it and we still encounter the same or a similar issue.

The first example is a request to the chat completion API where it accepts the JSON schema:

curl --request POST \
  --url …/v1/chat/completions \
  --header 'Authorization: Bearer XXX' \
  --header 'Content-Type: application/json' \
  --data '{
	"model": "gpt-4-turbo",
	"messages": [
		{
			"role": "user",
			"content": "Do something!"
		}
	],
	"tools": [
		{
			"type": "function",
			"function": {
				"name": "set_response",
				"description": "Set the response.",
				"parameters": {
					"type": "object",
					"title": "response",
					"$defs": {
						"Model3": {
							"type": "object",
							"properties": {
								"property1": {
									"type": "string",
									"description": "The first property."
								}
							},
							"required": [
								"property1",
								"propertyWithRef"
							]
						},
						"Model2": {
							"type": "object",
							"properties": {
								"property1": {
									"type": "string",
									"description": "The first property."
								},
								"propertyWithRef": {
									"$ref": "#/definitions/Model3"
								}
							},
							"required": [
								"property1"
							]
						},
						"Model1": {
							"type": "object",
							"properties": {
								"property1": {
									"type": "string",
									"description": "The first property."
								},
								"propertyWithRef": {
									"type": "array",
									"items": {
										"$ref": "#/definitions/Model2"
									}
								}
							},
							"required": [
								"property1",
								"propertyWithRef"
							]
						}
					},
					"properties": {
						"property1": {
							"type": "string",
							"description": "The first property."
						},
						"propertyWithRef": {
							"type": "array",
							"items": {
								"$ref": "#/definitions/Model1"
							}
						}
					},
					"required": [
						"property1"
					]
				}
			}
		}
	],
	"tool_choice": "auto"
}'

As you see the JSON schema uses refs up to a depth of 3.

But whenever I chamne one more layer to use the ref in an array schema definition , the schema isn’t accepted anymore.

curl --request POST \
  --url …/v1/chat/completions \
  --header 'Authorization: Bearer XXX' \
  --header 'Content-Type: application/json' \
  --data '{
	"model": "gpt-4-turbo",
	"messages": [
		{
			"role": "user",
			"content": "Do something!"
		}
	],
	"tools": [
		{
			"type": "function",
			"function": {
				"name": "set_response",
				"description": "Set the response.",
				"parameters": {
					"type": "object",
					"title": "response",
					"$defs": {
						"Model3": {
							"type": "object",
							"properties": {
								"property1": {
									"type": "string",
									"description": "The first property."
								}
							},
							"required": [
								"property1",
								"propertyWithRef"
							]
						},
						"Model2": {
							"type": "object",
							"properties": {
								"property1": {
									"type": "string",
									"description": "The first property."
								},
								"propertyWithRef": {
									"type": "array",
									"items": {
										"$ref": "#/definitions/Model3"
									}
								}
							},
							"required": [
								"property1"
							]
						},
						"Model1": {
							"type": "object",
							"properties": {
								"property1": {
									"type": "string",
									"description": "The first property."
								},
								"propertyWithRef": {
									"type": "array",
									"items": {
										"$ref": "#/definitions/Model2"
									}
								}
							},
							"required": [
								"property1",
								"propertyWithRef"
							]
						}
					},
					"properties": {
						"property1": {
							"type": "string",
							"description": "The first property."
						},
						"propertyWithRef": {
							"type": "array",
							"items": {
								"$ref": "#/definitions/Model1"
							}
						}
					},
					"required": [
						"property1"
					]
				}
			}
		}
	],
	"tool_choice": "auto"
}'

Response with

{
	"error": {
		"message": "Invalid schema for function 'set_response'. Please ensure it is a valid JSON Schema.",
		"type": "invalid_request_error",
		"param": "tools[0].function.parameters",
		"code": "invalid_function_parameters"
	}
}

We would really appreciate if someone could look into this issue soon.

Sorry for not including the full URL in the curls. This forum doesn’t want me to post links…

korbinianfritsch · June 17, 2024, 7:26am

I can reproduce this.
I think it’s worthy to mention that both request bodies are valid JSON schemata even though the invalid_request_error is only shown for the second example containing refs with a depth greater than 3.

aaronv · June 17, 2024, 5:06pm

have you tried using a different schema format? I find that if we write things like:

{
   param1: string
   param2: {
       nestedParam1: string
   }
}

as if they were type interfaces works better for us

ranzwertig · June 17, 2024, 6:58pm

Not sure what you mean. As far as I know we have to provide a valid JSON schema. Also our schema is created by Pydantic. So there is not much we can do besides restructuring the whole schema and writing it on our own.

But since it stopped working at some point, seems to be an issue in the API, and I would expect it to be fixed at some point.

My current workaround is, to not use functions and pass the schema in the system message. Which works fine for our use case.

aaronv · June 18, 2024, 6:04pm

we’re writing a small library to convert a json schema (from a pydantic model) into a type interface as I mentioned above.

Do you call the openai APIs using the python client or use another library like instructor? We should have it ready in around a week if you’re interested.

ranzwertig · June 18, 2024, 6:26pm

We use OpenAI’s Python client. However, as you can see in my examples above, the issue must be on OpenAI’s side since it also doesn’t accept the JSON schema when we send a request using curl.

I already have a workaround, but there is still an issue with the API.

This might not be the right place to post issues that can’t be solved by the community.

aaronv · June 18, 2024, 6:33pm

I see, GPT says there are some issues with the actual schema: ChatGPT , might want to take a look. There are various versions of JSON schema

Topic		Replies	Views
Introducing Structured Outputs Announcements api	60	9079	December 18, 2024
Structured Response: enums not supported in with Pydantic schema generation Bugs	13	1725	September 20, 2024
Json format causes infinite "\n \n \n \n" in response API gpt-4 , api , json-mode	20	8866	February 21, 2025
[Structured Outputs]Response 500 [gpt-4o-2024-08-06] API gpt-4 , chatgpt , api , error , gpt-4o	3	405	August 9, 2024
Malformed JSON in GPT4-1106 function arguments Bugs gpt-4 , api	3	660	March 24, 2024

Pydantic response model failure

Related topics