I wonder if anyone knows why we get different results when running the same prompt multiple times in a row.
I have noticed in quite a lot of my experiments that if you set a cool-down time in between each run, the results tend to be consistent again. In all of these runs, I have set the temperature parameter to zero.
Do GPTs have any state? So running prompts one after another influence each other?
OpenAI models are non-deterministic, meaning that identical inputs can yield different outputs. Setting temperature to 0 will make the outputs mostly deterministic, but a small amount of variability may remain due to GPU floating point math.
Interesting question, I’m not sure to be honest. I have often found the deterministic answers to be acceptable, but perhaps not innovative, I think a high temp will produce a better answer “some of the time” and worse on others. 0 will get you a consistent reply but potentially not the best that is possible.
Hey @boris , do you think anything has changed in terms of the underlying asynchronous floating point operations in GPU in the past few months that might have increased the determinism in OpenAI endpoints?
If temperature=0, does setting top_p to any values (either 0 or 1 or a value in between) have any effect?
No effect!
, do you think anything has changed in terms of the underlying asynchronous floating point operations in GPU in the past few months that might have increased the determinism in OpenAI endpoints?
Technically, the temperature is a divisor of the logits of possible tokens.
So lets say I have two possibilities of tokens the AI might generate, and represent them one-dimensionally:
" the" = .3333
" a" = .2500
Dividing by temperature 0.5 is multiplying by 2:
" the" = 0.6666
" a" = .5000
which increases the distance between them with normalization.
A multinomial distribution function then picks by probabilities of likeliness. Probabilities equivalent to “die face 1”, expect “1” 16.66% of the time.
So temperature 0.000001 - massive favoring of top token. Divide by 0, don’t know their code replacement for that.
Top-p is a nucleus sampling parameter that also can be passed by API, removing low probability tokens from consideration. A very low value gives a corresponding effect.
@boris I understand nothing has changed in the OpenAI and software layers. How about the cuDNN, Cuda, and GPU drivers? Could something have changed there that might have increased determinism?