Max_tokens limits the total tokens used instead of the output tokens

Welcome to the Forum Nicolas!

A couple of points in response to your issue:

  1. By default, the latest models are limited to 4,096 output tokens independent of the context window size. So this is the absolute maximum you could yield. The amount of output tokens the model can theoretically return is furthermore influenced by the amount of your input tokens. Meaning, in the case of gpt-3.5 if you were to provide 14,000 input tokens, then there would only be 2,385 tokens available for output etc.

  2. In practice, the model rarely ever returns the full amount of 4,096 output tokens. Besides the amount of input tokens, the second factor that influences the length of output is your prompt. There are certain approaches and wording you can apply to get more detailed responses that can reach up to over 3,000 tokens. It typically requires a bit of trial and error.

  3. The max_token hyperparameter does not have a bearing on how many tokens a model produces in response to a specific prompt. It’s simply a means to limit the model’s response to a maximum amount of tokens. For example, if you set the value to 200, the model’s response will be cut exactly at 200 tokens - even if this is in the middle of the sentence.

Bearing these three points in mind, perhaps you can share details on what you are trying to achieve including an example prompt and we may be able to provide some additional ideas on how you can increase your output tokens.

1 Like