Strategies for Handling Large Nested JSON Schema

m_9 · January 19, 2025, 8:46pm

I’m currently working with a large and deeply nested JSON schema that exceeds the output limits of structured outputs. The schema is written in Joi initially and due to the complexity and size, I cannot convert it to Zod or simplify it easily.

Here are the key challenges:

The schema is overly complex, with nested objects, arrays, and even recursive structures.
Additionally the schema contains a lot of unsupported features such as oneOf and its not easily possible to clean this up
Due to the structured output size limits, I need a way to validate and work with the schema without running into payload constraints.
My ultimate goal is to provide the entire schema in a manageable way, either programmatically or by breaking it down for specific use cases.

What is the suggested approach here - does anyone had a way to solve a similar challenge?

Diet · January 20, 2025, 2:20am

Welcome to the community!

What model are you using?

My default answer for this is to skip structured outputs*, potentially use a legacy GPT-4 model, perhaps switch to XML, make the schema more model accessible if necessary (i.e. skip SOAP, make sure follow-on entities are retrievable by the current entity), and allow the model to continue interrupted generations.

Of course, I’d also suggest you do try to find a way to split up your schema and separate your workflow step into multiple sub-tasks, but you mentioned that might be hard to do. I’d nonetheless urge you to think about it.

I hope that some of these ideas might help you get started!

_{*I think I’m in the minority here, but that’s what I’d do}

Topic		Replies	Views
Fine-tuning model with JSON-schema API	0	260	September 23, 2024
Possible Structured Output Schema Length Error Bug in API Bugs structured-output	2	925	October 23, 2024
Dealing with long schemas for 300 char restrictions Plugins / Actions builders chatgpt	0	112	August 29, 2024
Structured Output Issue in GPT-4o API – Response Truncation at Specific Index API api , structured-output	1	207	March 19, 2025
Best practices to help GPT understand heavily nested json data and analyse such data Prompting gpt-4	4	3647	August 28, 2024

Strategies for Handling Large Nested JSON Schema

Related topics