Output length of gpt-4o and gpt-4.5 is far below expected for large input

Hi there, I have an input of 20k tokens. I want 4o (2024-11-20) / 4.5 (preview-2025-02-27) to rephrase in a certain structure for me. The max output length is set to 16k, and I expect the real output is at least 7k. However the output is always around 1-2k, no matter how I encourage more output in the developer message. What is the reason, and how should I solve it? Thanks!

I expect solutions other than breaking input into pieces. Because I am wondering why output length in a single run is far below the max-token, given input + max-token is also far below the max context size. Does it mean the nature of gpt model leads to an underlying “true” max output length, regardless of the parameter we set? In other words, does it mean, even the task does not require smart intelligence, when the input is large (say, 45min interview or 2h seminar transcript), we have to use the o-series model? (but truth is I find gpt-series outperform o-series for short input in my task, so switch to o-series is not a perfect solution to me in large input, instead a compromise)

I have the same question. Is there any way to solve it?

OpenAI is really crushin’ it - in terms of crushing down the output length the model will produce. Right at about 1700 tokens.

You can try o3-mini and see if it hasn’t retro-damaged.

Or try Gemini Flash 2.5, which will write 10x the length without hesitation.

The huge flashy context window always seemed to be for input. If OpenAI’s training dataset never has the AI producing that many tokens, then you’ll never be able to prompt it into doing that. You might consider segmenting your input and having it process each one at a time. This will cost more, but you’ll benefit from input caching.

You could also try fine-tuning. With a good dataset, you may even be able to drop to a “mini” and save on inference.