First, this is just wrong. There very much is a “real meaning” of temperature T = 0—division by zero not withstanding.
The meaning of T = 0 is greedy sampling. The limiting behavior of, say softmax, is to return a one-hot encoded vector where the element with the highest sampling probability is mapped to 1 and all other elements are mapped to 0.
So to say,
there is no real meaning of “temperature 0”
is just flatly wrong.
Gonna need a big ol’ citation for that, sport. It is far easier and more appropriate to simply perform a greedy sample with T = 0.
I just fired off 50 runs in the playground and 50 out of 50 were the same.
¯\_(ツ)_/¯
Next I requested n = 50 responses through the API and they were all the same.
Now, there has been some discussion regarding the model not being perfectly deterministic at T = 0, but none of that anywhere has ever suggested it’s because OpenAI is not actually using a temperature of 0.
The two theories I’ve seen that hold the most weight (for me) are,
- GPT-4 is a sparse mixture-of-experts model, so when they batch tokens for evaluations, your input tokens can find themselves in a race condition with others. The end result becomes that the model is deterministic at the batch—not sequence—level. This is mentioned in the paper, From Sparse to Soft Mixtures of Experts.
- Some parts of the GPU parallelism employed may be non-deterministic. For instance, the order in which values are summed can propagate floating point inaccuracies. It’s possible these inaccuracies are the root cause of the non-determinism.
I consider both of these to be infinitely more likely than “OpenAI isn’t doing greedy-sampling for T = 0.”
