Clarification for max_tokens

Hi @nashid.noor, @overbeck.christopher and @kathyh

Every model has a context length. It cannot be exceeded.

As I shared above max_tokens only specifies the max number of tokens to generate in the completion, it is not necessarily the amount that will get generated.

However, if the sum of tokens in prompt + max_tokens exceeds the context length of the model, the request will be considered invalid and you’ll get a 400.


This model's maximum context length is 4096 tokens. However, you requested 4157 tokens (62 in the messages, 4095 in the completion). Please reduce the length of the messages or completion.