I wonder if anyone knows why we get different results when running the same prompt multiple times in a row.
I have noticed in quite a lot of my experiments that if you set a cool-down time in between each run, the results tend to be consistent again. In all of these runs, I have set the temperature parameter to zero.
Do GPTs have any state? So running prompts one after another influence each other?
OpenAI models are non-deterministic, meaning that identical inputs can yield different outputs. Setting temperature to 0 will make the outputs mostly deterministic, but a small amount of variability may remain due to GPU floating point math.
Recently, we are noticing “full determinism” in responses given the same prompt with temperature=0. Has something changed?
Welcome to the forum!
It is very deterministic, but you may see some variation over hundreds of runs.
So would using a temperature of 0 be the most accurate?
Interesting question, I’m not sure to be honest. I have often found the deterministic answers to be acceptable, but perhaps not innovative, I think a high temp will produce a better answer “some of the time” and worse on others. 0 will get you a consistent reply but potentially not the best that is possible.
Hey @boris , do you think anything has changed in terms of the underlying asynchronous floating point operations in GPU in the past few months that might have increased the determinism in OpenAI endpoints?
@Foxabilo If temperature=0, does setting top_p to any values (either 0 or 1 or a value in between) have any effect?
Technically, the temperature is a divisor of the logits of possible tokens.
So lets say I have two possibilities of tokens the AI might generate, and represent them one-dimensionally:
" the" = .3333
" a" = .2500
Dividing by temperature 0.5 is multiplying by 2:
" the" = 0.6666
" a" = .5000
which increases the distance between them with normalization.
A multinomial distribution function then picks by probabilities of likeliness. Probabilities equivalent to “die face 1”, expect “1” 16.66% of the time.
So temperature 0.000001 - massive favoring of top token. Divide by 0, don’t know their code replacement for that.
Top-p is a nucleus sampling parameter that also can be passed by API, removing low probability tokens from consideration. A very low value gives a corresponding effect.
(informed by GPT-2 code)
@boris I understand nothing has changed in the OpenAI and software layers. How about the cuDNN, Cuda, and GPU drivers? Could something have changed there that might have increased determinism?