Clarification for max_tokens

nashid.noor · July 23, 2022, 9:36pm

My interpretation for max_tokens is it specifies the upper-bound on the length of the generated code.

However, the documentation is confusing. I am referring to the official API documentation OpenAI API

The maximum number of [tokens](https://beta.openai.com/tokenizer) to generate in the completion.

The token count of your prompt plus `max_tokens` cannot exceed the model's context length. Most models have a context length of 2048 tokens (except for the newest models, which support 4096).

So at first documentation mention the maximum number of tokens to generate in the completion. But then it states it is token counts in the prompt + completion < 4000. I mentioned 4000 as it is the maximum token limit for davinci model.

So what is it?

is it the maximum token that would be generated during completion?
OR
token counts in the prompt + ``completion` < 4000

overbeck.christopher · August 30, 2022, 5:16am

token counts in the prompt + ``completion` < 4000

kathyh · March 9, 2023, 5:08pm

I’m going with @overbeck.christopher here and staying on the conservative side the wording in context as original poster @nashid.noor pointed out remains confusing and leaves me questioning:

Should I add my set max_tokens to the token count of my prompt to arrive at a number no larger than the limit of the model I’m using?

These would be different numbers, but again I’ll work with the conservative approach for now.

sps · March 9, 2023, 5:24pm

Hi @nashid.noor, @overbeck.christopher and @kathyh

Every model has a context length. It cannot be exceeded.

As I shared above max_tokens only specifies the max number of tokens to generate in the completion, it is not necessarily the amount that will get generated.

However, if the sum of tokens in prompt + max_tokens exceeds the context length of the model, the request will be considered invalid and you’ll get a 400.

e.g

This model's maximum context length is 4096 tokens. However, you requested 4157 tokens (62 in the messages, 4095 in the completion). Please reduce the length of the messages or completion.

kathyh · March 9, 2023, 5:30pm

Thank you for answering my question in relation to how it plays out with each model’s context length. This is super helpful to understand the order of operations happening and when I could actually hit an error from mishandling these (my brain works backward seeing these terms in how they behave I guess).

10basetom · August 13, 2023, 1:10am

Another point of confusion is max_tokens defaults to 16 – has anyone confirmed this? I haven’t used the API, but the ChatGPT website completions can be longer than 16 tokens.

_j · August 13, 2023, 1:21am

That is only for completions endpoints, which makes setting the max_tokens value essentially required.

For chat completion endpoint, you can simply not specify a max_token value, and then all the remaining completion space not used by input can be used for forming a response, without needing careful tedious token-counting calculation to try to get close.

Reminder, max_tokens is a reservation of the model’s context length that is exclusively for forming your answer, as well as setting a limit to how much comes back.

jagandevaki1 · October 3, 2023, 9:18am

max_tokens only specifies the max number of tokens to generate in the completion , can you explain what is max number of tokens to generate in completion? here what does completion mean? does it mean the response generated by the llm?

curt.kennedy · October 7, 2023, 11:28pm

Yes, a completion is the response from the LLM. The word “completion” comes from the original models that would return the most probable completion text for your input text. So basically, the autocompletion.

_j · October 8, 2023, 12:22am

Here is max tokens of 6, with a completion model. It does a very advanced version of writing what comes next:

The colored text is AI’s six tokens of completion output after my un-highlighted writing prompt.

Because this “completion” is so talented and versatile, we can give other writing formats for it to complete:

With only six tokens, we didn’t get much text, however, I can have it “complete” what it was writing again for another six tokens:

I like the name “Ava.” It is simple,

So the max_tokens value will cut off the AI’s output if you don’t set it large enough. The AI doesn’t know what this setting is.

The chat endpoint puts all the “human” and “AI” banter into containers, and the model has been trained to perform in a conversation setting.

Topic		Replies	Views
Question regarding max_tokens Prompting	11	37463	December 13, 2023
Doubt on prompt tokens and completion tokens API api	2	1167	April 18, 2024
I need help using openai API API chatgpt , gpt-4o-mini	2	213	October 29, 2024
Max_tokens seems to do nothing for me 3.5 Turbo API	14	3294	December 18, 2023
Not allowed to have all 8192 tokens API gpt-4	16	10970	December 18, 2023

Clarification for max_tokens

Related topics