Large JSON Responses from Assistant API are truncated

I’m using assistants to extract structured json output from my text input.

I’m using GPT-4o

I have an input that generates a lot of JSON output. It appears that when the output reaches around 7800 chars (about 16K), the output simply stops and the result is invalid JSON.

This behavior is observed with JSON Mode both enabled and disabled.

Are there any suggestions for something to try to get around this? Is this a OpenAI limitation?

As far as I know GPT-4o still has a 8k token limit, so yes this is expected.

You might find this thread interesting:

TL;DR: See if you can prepend line numbers and have the model return start and end line numbers instead of full text.

‘message’: 'max_tokens is too large: 4111. This model supports at most 4096 completion tokens, whereas you provided 4111.

max_tokens sets the largest response you can receive back, and is limited by OpenAI, likely because the model devolves worse than we already see it doing on long responses.

Then the AI is trained to curtail its output even lower.

This is a parameter that you have control over when using chat completions. Assistants is made of things out of your control.

https://platform.openai.com/docs/models/overview indicates 128K Context Window for gpt-4o .

On that same page, some preview models such as gpt-4-0125-preview are indicated as “maximum of 4,096 output tokens” but Gpt-4o does not appear to be documented as having such a limitation. If such a limitation actually exists, then it would explain the behavior that I’m seeing.

1 Like

I have run into the exact same issue. It would be nice to get some clarification on the actual output context limit for GPT-4o.

GPT4o has a semi-documented limit of 4096 output tokens. If you try to push max_tokens higher, the API complains very loudly: gpt4o maxes at 4096 output tokens — Prompt Fiddle