Structured Outputs: Invalid schema for response_format, Extra required key supplied > OpenAI & Zod don't work together for nested structures?

I’m trying to implement Structured Outputs in a Node app (GPT4o, 2024-08-01-preview), but it is giving me the error:

400 Invalid schema for response_format ‘book’: In context=(), ‘required’ is required to be supplied and to be an array including every key in properties. Extra required key ‘chapters’ supplied.

It seems to be pointing to my schema below (simplified here). As I read in the docs, all properties should be listed under required, also the optional ones, so that seems correct below (I of course also tried it with different versions for .required (and by leaving out required completely) but all with the same outcome). If I remove chapters: z.record(z.string(), ChapterSchema), the error is gone, but it doesn’t return the chapters. What is wrong with the code below?

import { AzureOpenAI } from 'openai';
import { z } from 'zod';
import { zodResponseFormat } from 'openai/helpers/zod';

const ChapterSchema = z
  .object({
    title: z.string(),
    subtitle: z.string().nullable(),
  })
  .strict();

export const BookSchema = z
  .object({
    author: z.string(),
    publisher: z.string().nullable(),
    chapters: z.record(z.string(), ChapterSchema),
  })
  .required([
    'author',
    'publisher',
    'chapters',
  ])
  .strict();

const openai = new AzureOpenAI({});
const response = await openai.beta.chat.completions.parse({
    model: process.env.OPENAI_MODEL || '',
    messages: prompt,
    response_format: zodResponseFormat(BookSchema, 'book'),
});

UPDATE:

If I create the JSON structure manually to send along with the response_format` instead of using zod, it turns out the crucial thing is to include the ChapterSchema as a definition referenced inside the BookSchema. Only then I was able to get it to work.

const openai = new AzureOpenAI({});
const response = await openai.beta.chat.completions.parse({
    model: process.env.OPENAI_MODEL || '',
    messages: prompt,
    response_format: {
      type: 'json_schema',
      json_schema: {
        name: 'book',
        strict: true,
        schema: {
          type: 'object',
          properties: {
            author: {
              type: 'string',
            },
            publisher: {
              type: ['string', 'null'],
            },
            chapters: {
              type: 'object',
              additionalProperties: {
                $ref: '#/definitions/ChapterSchema',
              },
            },
          },
          // Required only includes 'author' and 'publisher'
          required: ['author', 'publisher'],
          additionalProperties: false,
          definitions: {
            ChapterSchema: {
              type: 'object',
              properties: {
                title: {
                  type: 'string',
                },
                subtitle: {
                  type: ['string', 'null'],
                },
              },
              required: ['title', 'subtitle'],
              additionalProperties: false,
            },
          },
        },
      },
    },
});

Hence, it looks like zod, although recommended in the OpenAI docs, does not work with OpenAI when using a nested structure, as a nested structure seems to require the use of definitions and definition do not seem to be supported by zod. I hope a senior GPT expert can confirm or contradict?

I also had this problem and thought it could not be solved using zod. Good news! You can keep and use your zod schema (with a small change)!

export const BookSchema = z.object({
  author: z.string(),
  publisher: z.string().nullable(),
  chapters: z.object({}).catchall(
    z.object({
      title: z.string(),
      subtitle: z.string().nullable(),
    }).strict()
  ),
}).required([
  'author',
  'publisher',
  'chapters',
]).strict();

Two issues:

  • z.record() doesn’t play well with the parser
  • referencing a zod schema in another zod schema also doesn’t work well with the parser