Does increasing max_length parameter actually encourage the model to produce longer answers, or is it purely an upper bound that is used for truncation when necessary? In other words, does the actual LLM know about the max_length when deciding what to write, or is it just a short-circuit mechanism that is tacked on (ex. for cost control)?
My understanding was that it is an upper bound, but I have heard/read anecdotal reports that disagree, so I am looking for an authoritative answer, ideally by someone from inside OpenAI.