The max_tokens
parameter of the completions
endpoint determines the maximum number of tokens the model is supposed to generate, but it’s unclear to me whether this limit is for all or each response. Example: if max_tokens
is 10 and the number of responses n
is 2, then can the model generate 10 tokens for both the first and the second response or it can only generate 10 tokens across all responses? After some experimentation, it seems that max_tokens
limits the number of tokens for each response.
For example, if you execute the following code, it will print 5 (the value of max_tokens
) and we get two responses with that number of tokens
import openai
import tokenizers # hugging face package
completions = openai.Completion.create(model="text-ada-001",
prompt="tell me about quantum physics",
temperature=1,
max_tokens=5,
n=2)
tokenizer = tokenizers.Tokenizer.from_pretrained("gpt2")
response = "\n\nQuantum physics"
print("Length of each response: ", len(tokenizer.encode(response).tokens))
Please, I’d like an answer from official OpenAI employees or developer, no guesses. I am already doing more than guessing here based on my experiments.
I encourage you to update the docs: OpenAI API, which are ambiguous, hence my question.