Hello,
I’m building an AI tool and using GPT-4.1, for example. I’ve noticed that when my tool needs to output a long answer, the model tends to shorten it because of output limitations. This often results in sub-optimal answers, so I really need to increase the output length.
I ran some tests like “repeat exactly the following text …” to measure the maximum length the AI can produce. It seems the actual maximum completion_tokens
is under 6,000 (prompt_tokens
is around 6000 in these tests). Even when I set max_output_tokens: 25000
, it didn’t help.
Does anyone know why my output limit with GPT-4.1 seems to be under 6,000 tokens, while the website says 32,768 tokens? What can I do to increase the output capacity of my AI?
Thank you and regards
Tie