Allow `"annotation-only" keywords (like `description`, `title`, and `examples`) for $ref keywords when using JSON structured output

When using structured output via the API, the JSON validator rejects schemas where a $ref keyword has extra sibling keywords, like description:

{
  type: object,
  properties: {
    foo: {
      $ref: #/$defs/Foo,
      description: A Foo object.
    }
  },
  required: [foo],
  additionalProperties: false,
  $defs: {
    Foo: {
      type: object,
      properties: {
        id: { type: string }
      },
      required: [id],
      additionalProperties: false
    }
  }
}

Will result in:

"openai.BadRequestError: Error code: 400 - {ā€˜error’: {ā€˜message’: ā€œInvalid schema for response_format ā€˜json_schema’: context=(ā€˜properties’, ā€˜foo’), $ref cannot have keywords {ā€˜description’}.ā€, ā€˜type’: ā€˜invalid_request_error’, ā€˜param’: ā€˜text.format.schema’, ā€˜code’: ā€˜invalid_json_schema’}}

This severely limits the expressivity of the schema. If a schema contains an object Foo and an object Bar, and Foo has a keyword of type Bar, you are currently unable to give the model guidance about the relationship between a Foo’s Bar. You can provide a description for both the Foo object and the Bar object themselves, but not the relation. You could define the Foo-Bar relation in the descriptions of Foo and/or Bar themselves, but this breaks as soon as you introduce more objects that may also own a Bar.

This can be trivially worked around by wrapping the keyword in anyOf, but this isn’t an ideal long-term solution:

{
  type: object,
  properties: {
    foo: {
      anyOf: [
        { $ref: #/$defs/Foo }
      ],
      description: A Foo object.
    }
  },
  required: [foo],
  additionalProperties: false,
  $defs: {
    Foo: {
      type: object,
      properties: {
        id: { type: string }
      },
      required: [id],
      additionalProperties: false
    }
  }
}
1 Like

This is particularly a problem when working with Pydantic, which provides description, title, and examples fields that would beneficial to provide in a structured output schema: Fields - Pydantic Validation . Using structured output with a Pydantic schema like the following doesn’t work natively — it requires either wrapping the generated $ref for Bar in anyOf or dropping the description entirely.

class Bar(BaseModel):
    id: str

class Foo(BaseModel):
    bar: Bar = Field(description="A Foo's bar.")

Here’s a more descriptive example of why providing a description alongside $ref Fields is important:

class Identifier(BaseModel):
    value: str

class User(BaseModel):
    # Here "Identifier" means "the user's internal ID"
    user_id: Identifier = Field(description="Unique internal user identifier.")

class Order(BaseModel):
    # Here the same "Identifier" type means "the merchant's order number"
    order_number: Identifier = Field(description="Merchant-facing order number shown on receipts.")