Strategies for Handling Large Nested JSON Schema

I’m currently working with a large and deeply nested JSON schema that exceeds the output limits of structured outputs. The schema is written in Joi initially and due to the complexity and size, I cannot convert it to Zod or simplify it easily.

Here are the key challenges:

  • The schema is overly complex, with nested objects, arrays, and even recursive structures.
  • Additionally the schema contains a lot of unsupported features such as oneOf and its not easily possible to clean this up
  • Due to the structured output size limits, I need a way to validate and work with the schema without running into payload constraints.
  • My ultimate goal is to provide the entire schema in a manageable way, either programmatically or by breaking it down for specific use cases.

What is the suggested approach here - does anyone had a way to solve a similar challenge?

Welcome to the community!

What model are you using?

My default answer for this is to skip structured outputs*, potentially use a legacy GPT-4 model, perhaps switch to XML, make the schema more model accessible if necessary (i.e. skip SOAP, make sure follow-on entities are retrievable by the current entity), and allow the model to continue interrupted generations.

Of course, I’d also suggest you do try to find a way to split up your schema and separate your workflow step into multiple sub-tasks, but you mentioned that might be hard to do. I’d nonetheless urge you to think about it.

I hope that some of these ideas might help you get started!

*I think I’m in the minority here, but that’s what I’d do