Structured Response: datetime format not supported with pydantic schema

It seems that the schema generated by pydantic for models with datetime fields is not compatible with the schema validation with the new structured output:

python client example: (v1.40.6):

class JournalPublication(BaseModel):
    publisher: str = Field(description="The publisher of the journal")
    title: str = Field(description="The title of the article")
    publication_date: datetime = Field(description="The date the article was published")


def test_response_format():
    client = OpenAI()
    try:
        client.beta.chat.completions.parse(
            model="gpt-4o-2024-08-06",
            temperature=0,
            messages=[
                {"role": "user", "content": "When did Albert Einstein publish his discovery of Special Relativity?"}
            ],
            response_format=JournalPublication
        )
    except BadRequestError as e:
        schema = json.dumps(JournalPublication.model_json_schema(), indent=2)
        logger.error("Unsupported JSON schema generated=%s", schema)
        pytest.fail(f"Failed to parse response format: {e.message}")
Failed to parse response format: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'JournalPublication': In context=('properties', 'publication_date'), 'format' is not permitted", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}

Generated Schema:

{
  "properties": {
    "publisher": {
      "description": "The publisher of the journal",
      "title": "Publisher",
      "type": "string"
    },
    "title": {
      "description": "The title of the article",
      "title": "Title",
      "type": "string"
    },
    "publication_date": {
      "description": "The date the article was published",
      "format": "date-time",
      "title": "Publication Date",
      "type": "string"
    }
  },
  "required": [
    "publisher",
    "title",
    "publication_date"
  ],
  "title": "JournalPublication",
  "type": "object"
}

@mcantrell correct, according to the docs only the following types are supported:

  • String
  • Number
  • Boolean
  • Object
  • Array
  • Enum
  • anyOf

This is in line with JSON data type support, which doesn’t have support for datetime.

So the idea then is to use a String :slight_smile:

OK, I’ll cross my fingers and hope the data format is correct I guess

Updated the model and it seems to be working with the pydantic de-serialization:

  1. Instruct the model to use ISO date formatting in the description (pattern attribute for the properties is also not supported)
  2. Remove the format from the schema generation by overriding the pydantic method
class JournalPublication(BaseModel):
    publisher: str = Field(description="The publisher of the journal")
    title: str = Field(description="The title of the article")
    publication_date: date = Field(description="The date the article was published. Use ISO 8601 to format this value.")

    @classmethod
    def model_json_schema(cls, *args, **kwargs) -> dict[str, Any]:
        schema = super().model_json_schema(*args, **kwargs)
        for prop in schema.get('properties', {}).values():
            prop.pop('format', None)

        return schema
1 Like

Haha yes :sweat_smile:

One idea could be to specify the datetime format in your prompt. Something along the lines of:

'publication_date' should be provided in the following ISO 8601 format: 'YYYY-MM-DD'.

And then you enforce that format validation when converting to an actual datetime object once the response is received.

I did similar things in the past with e.g. ISO 630 language codes, and country codes as well.

Ah awesome, you beat me to it!

1 Like

lol yeah, just barely by the looks of it :rofl:

1 Like