I’d like to know the range of values that
top_p can take. I have already read the official documentation (https://platform.openai.com/docs/api-reference/chat/create), but it only mentions that
top_p is “top_p probability mass.”
What I particularly want to know is what happens when
top_p is set to zero. Will only the most probable tokens be adopted, or will it result in the same outcome as setting
top_p to null, thereby making it invalid?
I’m aware that there are some discussions about setting
top_p to zero in this forum, but is there any official information available anywhere?
top_p: 0 is different than top_p: 0.0000001
What kind of short-circuit is placed for top_p: 0 (or temperature) currently is unknown and likely will not be discussed.
I did extensive trials when deterministic GPT-3 models from OpenAI existed.
Particularly, in the case of exploring completions with near-identical top-2 logits, where temperature but not top_p can distinguish them. (which btw are hard to find). You could get alternate answers.
The very small number instead of 0 selects the first logit when they appear identical to that part of nucleus sampling. At that scale it cannot include more than one token, even if all 100k tokens were identical logprob. Something where you know how it works is my choice.
All models now have non-deterministic inference, and the actual top token can switch positions with another of similar likelihood despite the setting.
top_p does not exist in a vacuum.
You really have to factor the combined effect of
top_p + temperature
In short, top_p=0 is not a problem with temperature set above zero.
When you set them both to zero you run into problems.
Therefore, when you want something as deterministic as possible (or for testing), use something similar to
top_p=0.01 + temperature=0
I intended to add additional details; however I’ve cross-posted with @_j who is (as always) right on the money when he mentions the very small number.
Thank you all for your responses!
I got that no official announcement on what exact value is applied to top_p when it is set to 0. I think it would be a good idea for me to set top_p to a finite small value and set temperature to 0 when I want as deterministic a result as possible. Unlike top_p, the official documentation clearly states that the temperature can range from 0 to 2. (I think this temperature is something like a temperature in the softmax function, though no formula has been published for this either.)