First, this is just wrong. There very much is a “real meaning” of temperature T = 0
—division by zero not withstanding.
The meaning of T = 0
is greedy sampling. The limiting behavior of, say softmax, is to return a one-hot encoded vector where the element with the highest sampling probability is mapped to 1
and all other elements are mapped to 0
.
So to say,
there is no real meaning of “temperature 0”
is just flatly wrong.
Gonna need a big ol’ citation for that, sport. It is far easier and more appropriate to simply perform a greedy sample with T = 0
.
I just fired off 50 runs in the playground and 50 out of 50 were the same.
¯\_(ツ)_/¯
Next I requested n = 50
responses through the API and they were all the same.
Now, there has been some discussion regarding the model not being perfectly deterministic at T = 0
, but none of that anywhere has ever suggested it’s because OpenAI is not actually using a temperature of 0
.
The two theories I’ve seen that hold the most weight (for me) are,
- GPT-4 is a sparse mixture-of-experts model, so when they batch tokens for evaluations, your input tokens can find themselves in a race condition with others. The end result becomes that the model is deterministic at the batch—not sequence—level. This is mentioned in the paper, From Sparse to Soft Mixtures of Experts.
- Some parts of the GPU parallelism employed may be non-deterministic. For instance, the order in which values are summed can propagate floating point inaccuracies. It’s possible these inaccuracies are the root cause of the non-determinism.
I consider both of these to be infinitely more likely than “OpenAI isn’t doing greedy-sampling for T = 0
.”