Why is GPT-4.1 output capped at ~6,000 tokens despite a 32,768-token limit?

SoftTimur · September 11, 2025, 8:58pm

Hello,

I’m building an AI tool and using GPT-4.1, for example. I’ve noticed that when my tool needs to output a long answer, the model tends to shorten it because of output limitations. This often results in sub-optimal answers, so I really need to increase the output length.

I ran some tests like “repeat exactly the following text …” to measure the maximum length the AI can produce. It seems the actual maximum completion_tokens is under 6,000 (prompt_tokens is around 6000 in these tests). Even when I set max_output_tokens: 25000, it didn’t help.

Does anyone know why my output limit with GPT-4.1 seems to be under 6,000 tokens, while the website says 32,768 tokens? What can I do to increase the output capacity of my AI?

Thank you and regards

Tie

Topic		Replies	Views
Why is gpt-3.5-turbo-1106 max_tokens limited to 4096? API	3	14185	January 11, 2024
Hitting max output token limit for 4.1-mini API gpt-4 , api , responses , gpt-41-mini	2	368	July 28, 2025
Is the "output (Maximum length)" for the GPT-4-1106-preview API still capped at 4095? API gpt-4 , gpt-4-turbo	3	7656	November 15, 2023
Optimizing Token Utilization for GPT-4 with Vector Database: Overcoming 1000-Token Limit Challenges Community gpt-4 , api , assistants-api	2	486	October 9, 2024
Limit of output tokens in API for web search AI models API gpt-4 , api , web-search	1	391	June 24, 2025

Why is GPT-4.1 output capped at ~6,000 tokens despite a 32,768-token limit?

Related topics