Do I need to increase `max_tokens` when using `n>1` e.g. `n=3` for generating multiple chat completions

ehutt · June 30, 2023, 11:43pm

Do I need to increase max_tokens when using n>1 e.g. n=3 for generating multiple chat completions? For example, if I want to generate 3 possible completions each with a limit of 15 tokens, do I need to set max_tokens=45 instead of max_tokens=15 ?

p.s. I’m using gpt-3.5-turbo chat completions.

novaphil · July 1, 2023, 3:11am

What are you finding when you test those scenarios you mentioned?

charleswilliams1120 · July 1, 2023, 9:12am

I don’t know about for multiple… but when I’m playing around with it if I take the token count too high my bots get lost somewhere near their original token count and hallucinate a lot … best case scenario I’ve gotten was it kept repeating the last line it got stuck on

theevildays · July 1, 2023, 11:06am

Hey, hey, be careful using “n=X”, it will make your cost X times more!

sps · July 1, 2023, 4:52pm

Welcome to the OpenAI community @ehutt

When you pass n>1 you’ll get "n" number of completions, each following the specified max_token limit.

_j · July 1, 2023, 11:04pm

It doesn’t make your cost “X times more” to use N. It depends on if your cost comes mainly from input or output.

The best_of and n parameters may also impact costs. Because these parameters generate multiple completions per prompt, they act as multipliers on the number of tokens returned.

Your request may use up to num_tokens(prompt) + max_tokens * max(n, best_of) tokens, which will be billed at the per-engine rates outlined at the top of this page.

In the simplest case, if your prompt contains 10 tokens and you request a single 90 token completion from the davinci engine, your request will use 100 tokens and will cost $0.002.

So if you have a prompt “Scan this 15000-token book chapter text, count and return a response NumberOfSpellingErrors: X and NumberOfGrammarErrors: Y”, it would almost be silly not to request N=9 to ensure fluke-free reliable results - which would be far cheaper than asking again even once.

You can even do some trickery with baseline (non-chat) models, like do “best of 8” and “return N=3” where the best_of gives you the best logprobs for the whole answer and discards the five lowest.

theevildays · July 2, 2023, 9:22am

Hi, thanks for the clarification, what do you think how they might implemented “best” detection? I mean how do they determine which is the best?

Foxalabs · July 2, 2023, 1:08pm

You give them all back to the AI and ask it.

_j · July 2, 2023, 2:44pm

How is “best of” determined?

When the AI is completing an answer, it does so a token at a time (a word fragment). This is determined by a weighting algorithm from the massive knowledge and the prompt input.

For example, “Roll a six-sided die and give me the result.”, the highest probability token is “4”. Or “One example of a yellow fruit is a:”, and you’re going to get banana.

Producing exactly the same deterministic output for the input every time is not very creative, so a bit of randomness is introduced that can choose lower-probability words, controlled by softmax temperature. The conversation completion can then go in a different direction based on how to finish the sentence or more with the alternate word choice.

A total score can be assigned to the entire output of tokens and their probabilities to see how likely it was. An output with several divergences and the AI needing to adapt in new ways then scores lower than one that follows the charted path and doesn’t have language awkwardness.

I was going to give you an example within the playground of strictly completion using the most likely tokens, versus a string made of the 5th-likely tokens, (where instead of the next word being “headache” we continue with “sense”) but I got an awkward “end of text” token as 5th likely very soon, so play by the rules, the output would then be done:

Too many questions gives me a:
headache 92.6%  -- sense 0.37%
\n 65.57%  -- <|endoftext|> 0.01%
\n 99.97%  -- 
\n 79.59%  -- 
I 60.6%
'm 49.42%
 sorry 99.74%
 to 91.18%

From the percentages, you can still see the short second column would have a lower total score when the response is evaluated.

Topic		Replies	Views
The Relationship between Best of, Temperature and Top P (The Three Variable Problem) Prompting	10	4726	April 14, 2023
Questions on setting n and max_token API	4	935	March 20, 2024
How does `n` parameter work in chat completions? API gpt-35-turbo , chatgpt , api	12	13518	December 10, 2023
Clarification for max_tokens API codex	10	103452	December 12, 2023
Question regarding max_tokens Prompting	11	37996	December 13, 2023

Do I need to increase `max_tokens` when using `n>1` e.g. `n=3` for generating multiple chat completions

Related topics