Why is GPT-4.1 output capped at ~6,000 tokens despite a 32,768-token limit?

Hello,

I’m building an AI tool and using GPT-4.1, for example. I’ve noticed that when my tool needs to output a long answer, the model tends to shorten it because of output limitations. This often results in sub-optimal answers, so I really need to increase the output length.

I ran some tests like “repeat exactly the following text …” to measure the maximum length the AI can produce. It seems the actual maximum completion_tokens is under 6,000 (prompt_tokens is around 6000 in these tests). Even when I set max_output_tokens: 25000, it didn’t help.

Does anyone know why my output limit with GPT-4.1 seems to be under 6,000 tokens, while the website says 32,768 tokens? What can I do to increase the output capacity of my AI?

Thank you and regards

Tie