My interpretation for max_tokens is it specifies the upper-bound on the length of the generated code.
However, the documentation is confusing. I am referring to the official API documentation OpenAI API
The maximum number of [tokens](https://beta.openai.com/tokenizer) to generate in the completion.
The token count of your prompt plus `max_tokens` cannot exceed the model's context length. Most models have a context length of 2048 tokens (except for the newest models, which support 4096).
So at first documentation mention the maximum number of tokens to generate in the completion. But then it states it is token counts in the prompt + completion < 4000. I mentioned 4000 as it is the maximum token limit for davinci model.
So what is it?
is it the maximum token that would be generated during completion?
Thank you for answering my question in relation to how it plays out with each model’s context length. This is super helpful to understand the order of operations happening and when I could actually hit an error from mishandling these (my brain works backward seeing these terms in how they behave I guess).
That is only for completions endpoints, which makes setting the max_tokens value essentially required.
For chat completion endpoint, you can simply not specify a max_token value, and then all the remaining completion space not used by input can be used for forming a response, without needing careful tedious token-counting calculation to try to get close.
Reminder, max_tokens is a reservation of the model’s context length that is exclusively for forming your answer, as well as setting a limit to how much comes back.