Hitting max output token limit for 4.1-mini

I am trying to use the 4.1-mini model to generate some pretty lengthy JSON. When I do this, I get a response with the status as incomplete and the reason stated as “max_output_tokens” which is 32,768 for this particular model.
I then tried to set the max_output_tokens like so:

response = client.responses.create(
        model="gpt-4.1-mini",
        temperature=0,
        top_p = 1,
        max_output_tokens = 100000,
        input=[{"role": "system", "content": system_prompt.strip()}, {"role": "user", "content": questions}]
    )

This new limit is more than enough to generate the JSON. But when I try running this, I run into the same error. The API is still returning a max_output_tokens reached error at 32,768 tokens.
Why is my custom limit not working? And what can I do to solve this issue?
Thanks!

The model has an artificial limit where OpenAI has cut off the generation capability:

This is likely done both for the exponential increases in cost and because of untrained declining quality in long responses, usually only brought about by the AI “going nuts” (which may be the case here)

You cannot ask for more with an API parameter.

Welcome to the community @havock2926.

Every OpenAI model has maximum output token limit, and users cannot override this by using the API. These limits typically relate to, and are a function of the architecture of the model themselves.

If you would like to generate 100,000 output tokens, you could look at using either o4-mini or o3 models. These have a maximum output token limit of 100,000. If you want to continue using the 4.1-mini model, then you could look at breaking up your task creatively into subtasks whose outputs can easily be stitched together.

You can easily lookup the limits and compare each model on this official page. Hope this helps.

1 Like