In my experiments so far, which have involved Python and P5.js (built on top of Javascript), I have been unable to obtain a single response/completion from the same prompt and parameter settings with T=0. for example, I may prompt Codex to “make balls bounce on the screen”. I created a preset that serves as a few shot primer to get the appropriate code. The code generated is different each time. Are there recommended parameter settings (or specific prompt tweaks) to obtain determinism? I noted a similar question but related to reproducing Challenge prompts.
What I have done so far is to save the P5.js sketch if I like it. That serves as an archive and promotes reproducibility.
There’s inherent non determinism in GPU calculations around floating point operations - the differences in log probabilities are tiny, but when there’s a small difference between the top two likely tokens, then a different token might be chosen every now and then leading to different results
There are speed tradeoffs, and in order to make the endpoints fast GPUs are used, which do parallel (non deterministic) calculations. Any modern gpu neural net calculations will be subject to these.
Very simplified example to illustrate the point: a * b * c can be calculated either as (ab) c, or a(bc), but tiny differences can occur when performing floating point operations with the last few significant digits,leading to a very slightly different result. Sometimes these tiny differences can compound and be amplified within a network with argmax on the next token, if the logprobs are very close.
I was thinking through practical use-cases of this technology in a corporate setting and the stochastic nature was playing on my mind.
I can foresee trying to explain this to sales / helpdesk and getting worried looks - however - this is a new type of technology so rather than trying to fit existing paradigms people might need to get comfortable with new ones.
Yes, we gotta accept the stochastic nature of the neural nets. It’s as if there was a new (human) employee in the enterprise.
Even here, in this forum, I see a lot of people craving deterministic results at all costs. That thinking has to go, because we are co-working with the model, and that approach gives the best results.
Thanks for all of these excellent comments. It is not so much a question of the value of determinism, but instead the attempt to characterize the uncertainty given that we have a stochastic process. That the system is non-deterministic is fine. So, what does the uncertainty look like exactly? Bringing it home, if I were not so lazy, I would enter a prompt the ended with “Draw 10 circles”, run it 50 times, and try to understand the variance. Some of the variance may be due to Python library chosen for drawing, or indeed whether different programming languages are used. Which are used more or less often? I’ve also seen that some % of completions are invalid and will not compile/interpret.
On the same GPU? Or between different GPU families/drivers/…? I have been able to repeat GPU calculations with repeatable results on the same GPU.
Does this maybe imply that the server infrastructure mixes and matches different GPUs/drivers/…?
H100 compute clusters are not mix and match devices, there could be H100 and A100 blocks and you could get either one, but not a mix of the two at the same time, I don’t see a coherent driver stack that would spread a models weights over different hardware.