Is the max_tokens parameter of the completions endpoint applicable for ALL or EACH response?

nbro · March 8, 2023, 3:07pm

The max_tokens parameter of the completions endpoint determines the maximum number of tokens the model is supposed to generate, but it’s unclear to me whether this limit is for all or each response. Example: if max_tokens is 10 and the number of responses n is 2, then can the model generate 10 tokens for both the first and the second response or it can only generate 10 tokens across all responses? After some experimentation, it seems that max_tokens limits the number of tokens for each response.

For example, if you execute the following code, it will print 5 (the value of max_tokens) and we get two responses with that number of tokens

import openai
import tokenizers # hugging face package

completions = openai.Completion.create(model="text-ada-001",
                                       prompt="tell me about quantum physics",
                                       temperature=1,
                                       max_tokens=5,
                                       n=2)

tokenizer = tokenizers.Tokenizer.from_pretrained("gpt2")
response = "\n\nQuantum physics"
print("Length of each response: ", len(tokenizer.encode(response).tokens))

Please, I’d like an answer from official OpenAI employees or developer, no guesses. I am already doing more than guessing here based on my experiments.

I encourage you to update the docs: OpenAI API, which are ambiguous, hence my question.

ruby_coder · March 8, 2023, 3:08pm

Each API call

HTH

nbro · March 8, 2023, 3:13pm

I am not sure what you mean because, from my experiments, that’s not correct. max_tokens seems to limit the length of each response, not “each API call”, whatever that means.

ruby_coder · March 8, 2023, 3:46pm

Each API call means the response, actually (completion + prompt).

That is why I called it “each API call”

nbro · March 8, 2023, 3:49pm

But the response contains a lot more than what the model generated. So, your answer doesn’t make any sense to me and doesn’t seem to be consistent with the OpenAI API documentation, which says that The maximum number of tokens to generate in the completion., and the completion is not the HTTP response! Even if by “response” you mean model’s completion, then that doesn’t answer my question. It seems that you didn’t even read my question.

ruby_coder · March 8, 2023, 3:58pm

nbro · March 8, 2023, 4:00pm

How is that useful? I am not sure if you’re a bot or not, but you’re not being helpful honestly.

ehutt · July 3, 2023, 11:20pm

I had the same question, posed on this forum under the headline “Do I need to increase max_tokens when using n>1 e.g. n=3 for generating multiple chat completions” (I wasn’t allowed to post the link in this comment).

Based on my own testing and the comments on my thread, it seems that the max_tokens parameter applies to EACH of the n generations you request. It will not try to fit ALL completions into your max_tokens limit. I was using gpt-3.5-turbo but assume the same applies to other completion models.

Hope this helps

Topic		Replies	Views
Clarification for max_tokens API codex	10	104511	December 12, 2023
Questions on setting n and max_token API	4	955	March 20, 2024
Clarification about max_completion_tokens rate-limiting API rate-limit , o1-preview	4	896	October 10, 2024
Question regarding max_tokens Prompting	11	38125	December 13, 2023
I need help using openai API API chatgpt , gpt-4o-mini	2	246	October 29, 2024

Is the max_tokens parameter of the completions endpoint applicable for ALL or EACH response?

Related topics