Structured Output Issue in GPT-4o API – Response Truncation at Specific Index

Hello OpenAI Forum,

I have been experiencing a consistent issue when using the GPT-4o-2024-11-20 model via API with Structured Output. The model fails to return complete responses when processing large JSON inputs, consistently truncating the output at a specific index.

Issue Description

When submitting a structured JSON input for processing, the API successfully generates output but stops at index 255, even though the input continues beyond that point. The pattern is reproducible across multiple attempts and with different files, where truncation occurs at similar indices. Below is a generalized example:

Input JSON Sample (Generalized):

{
    "Data": [
        {
            "Index": 211,
            "Text": "Sample text for processing."
        },
        ...
        {
            "Index": 340,
            "Text": "Another sample text at a later index."
        }
    ]
}

Expected Output Format:

{
    "ProcessedData": [
        {
            "Index": 211,
            "ProcessedText": "Translated or structured response."
        },
        ...
        {
            "Index": 340,
            "ProcessedText": "Final expected output."
        }
    ]
}

Actual Output (Truncated at Index 255):

{
    "ProcessedData": [
        {
            "Index": 211,
            "ProcessedText": "Translated or structured response."
        },
        ...
        {
            "Index": 255,
            "ProcessedText": "Last returned response before truncation."
        }
    ]
}

Observations:

  • The API consistently stops generating output at index 255, regardless of different inputs.
  • The structured output schema does not explicitly define a hard limit.
  • No error message is returned—only an incomplete response.
  • The issue persists even when running multiple times with slight variations in input format.

Questions:

  1. Is there a predefined token or response size limit in Structured Output mode that causes truncation?
  2. Is there a workaround to ensure complete responses, such as breaking the input into smaller chunks?
  3. Has anyone else encountered similar structured output truncation in the GPT-4o API, and if so, how was it resolved?

Any insights or guidance on how to resolve this would be greatly appreciated. Thank you!

Right off the top, sounds like 256 limit somewhere in their backend code for the “type” in the “database” that’s holding those values.

I know this isn’t really answering your question fully, but I’ve run into similar issues internally when I have database fields designed with certain char limits like (255) or (256).

I’m not familiar enough with all these things to understand why these are the “normal limits” for certain field types or what have you, but I’ve seen it at play.

So I would imagine somewhere in the backend it’s defining a limit for JSON types at 255 or 256 and it’s simply not holding more values than that in the input.

Could be totally off, and it might have something else to do with how that input is processed and assigned tokens and limited at that level, but it’s an interesting number to me in that it points to a similar issue to what I’ve experience at an internal system level.

Obviously your probably more aware of these kinds of things than I am, but what I would suggest trying is simply splitting it across multiple “inputs” in the same way you would split it across multiple “role: user” messages.

I’m not familiar enough with the new responses API, but in the completions API I often split up messages as needed.

Combined with a “no_api_call” flag in your system that can be powerful, as you can build a set of messages up to the context window limit/context input limit at a raw level (i.e. the 1mb limit or what have you), and then subsequently send the “full set of messages” and simply instruct the model to provide output, and then at least you could test if it’s an actual OUTPUT limiter or if its an INPUT limiter… but you get around per-message limitations on data types perhaps…