Why the API output is inconsistent even after the temperature is set to 0

First, this is just wrong. There very much is a “real meaning” of temperature T = 0—division by zero not withstanding.

The meaning of T = 0 is greedy sampling. The limiting behavior of, say softmax, is to return a one-hot encoded vector where the element with the highest sampling probability is mapped to 1 and all other elements are mapped to 0.

So to say,

there is no real meaning of “temperature 0”

is just flatly wrong.

Gonna need a big ol’ citation for that, sport. It is far easier and more appropriate to simply perform a greedy sample with T = 0.

I just fired off 50 runs in the playground and 50 out of 50 were the same.

¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯

Next I requested n = 50 responses through the API and they were all the same.

API results

Now, there has been some discussion regarding the model not being perfectly deterministic at T = 0, but none of that anywhere has ever suggested it’s because OpenAI is not actually using a temperature of 0.

The two theories I’ve seen that hold the most weight (for me) are,

  1. GPT-4 is a sparse mixture-of-experts model, so when they batch tokens for evaluations, your input tokens can find themselves in a race condition with others. The end result becomes that the model is deterministic at the batch—not sequence—level. This is mentioned in the paper, From Sparse to Soft Mixtures of Experts.
  2. Some parts of the GPU parallelism employed may be non-deterministic. For instance, the order in which values are summed can propagate floating point inaccuracies. It’s possible these inaccuracies are the root cause of the non-determinism.

I consider both of these to be infinitely more likely than “OpenAI isn’t doing greedy-sampling for T = 0.”

3 Likes