Hi @nashid.noor, @overbeck.christopher and @kathyh
Every model has a context length. It cannot be exceeded.
As I shared above max_tokens
only specifies the max number of tokens to generate in the completion, it is not necessarily the amount that will get generated.
However, if the sum of tokens in prompt + max_tokens
exceeds the context length of the model, the request will be considered invalid and you’ll get a 400.
e.g
This model's maximum context length is 4096 tokens. However, you requested 4157 tokens (62 in the messages, 4095 in the completion). Please reduce the length of the messages or completion.