If it was possible for the model to come to a coherent finish of its output, i.e. end of a sentence, before the max tokens is reached that would make the models much easier to work with when trying to process large or variable amounts of data. In these cases, I don’t want to instruct the model to be brief, because I want to maximize output. But if the model comes up against its max tokens, then the output ends awkwardly and leads to a bad response.