Hi there, I have an input of 20k tokens. I want 4o (2024-11-20) / 4.5 (preview-2025-02-27) to rephrase in a certain structure for me. The max output length is set to 16k, and I expect the real output is at least 7k. However the output is always around 1-2k, no matter how I encourage more output in the developer message. What is the reason, and how should I solve it? Thanks!
I expect solutions other than breaking input into pieces. Because I am wondering why output length in a single run is far below the max-token, given input + max-token is also far below the max context size. Does it mean the nature of gpt model leads to an underlying “true” max output length, regardless of the parameter we set? In other words, does it mean, even the task does not require smart intelligence, when the input is large (say, 45min interview or 2h seminar transcript), we have to use the o-series model? (but truth is I find gpt-series outperform o-series for short input in my task, so switch to o-series is not a perfect solution to me in large input, instead a compromise)