Run same query many times - different results

I wonder if anyone knows why we get different results when running the same prompt multiple times in a row.

I have noticed in quite a lot of my experiments that if you set a cool-down time in between each run, the results tend to be consistent again. In all of these runs, I have set the temperature parameter to zero.

Do GPTs have any state? So running prompts one after another influence each other?

OpenAI models are non-deterministic, meaning that identical inputs can yield different outputs. Setting temperature to 0 will make the outputs mostly deterministic, but a small amount of variability may remain due to GPU floating point math.

4 Likes

Recently, we are noticing “full determinism” in responses given the same prompt with temperature=0. Has something changed?

Welcome to the forum!

It is very deterministic, but you may see some variation over hundreds of runs.

2 Likes

So would using a temperature of 0 be the most accurate?

1 Like

Interesting question, I’m not sure to be honest. I have often found the deterministic answers to be acceptable, but perhaps not innovative, I think a high temp will produce a better answer “some of the time” and worse on others. 0 will get you a consistent reply but potentially not the best that is possible.

2 Likes

Hey @boris , do you think anything has changed in terms of the underlying asynchronous floating point operations in GPU in the past few months that might have increased the determinism in OpenAI endpoints?

@Foxalabs If temperature=0, does setting top_p to any values (either 0 or 1 or a value in between) have any effect?

If temperature=0, does setting top_p to any values (either 0 or 1 or a value in between) have any effect?

No effect!

, do you think anything has changed in terms of the underlying asynchronous floating point operations in GPU in the past few months that might have increased the determinism in OpenAI endpoints?

No

1 Like

Technically, the temperature is a divisor of the logits of possible tokens.

So lets say I have two possibilities of tokens the AI might generate, and represent them one-dimensionally:

" the" = .3333
" a" = .2500

Dividing by temperature 0.5 is multiplying by 2:

" the" = 0.6666
" a" = .5000

which increases the distance between them with normalization.

A multinomial distribution function then picks by probabilities of likeliness. Probabilities equivalent to “die face 1”, expect “1” 16.66% of the time.

So temperature 0.000001 - massive favoring of top token. Divide by 0, don’t know their code replacement for that.

Top-p is a nucleus sampling parameter that also can be passed by API, removing low probability tokens from consideration. A very low value gives a corresponding effect.

(informed by GPT-2 code)

1 Like

@boris I understand nothing has changed in the OpenAI and software layers. How about the cuDNN, Cuda, and GPU drivers? Could something have changed there that might have increased determinism?