For applications that require highest deterministic output, being notified that that OpenAI has (otherwise stealthfully) updated the model and the type of output that may be generated is useful information to capture.
Thanks for the quick reply! Let me express myself a little clearer.
From official OpenAI docs:
system_fingerprint This fingerprint represents the backend configuration that the model runs with. Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.
I did not mean system_fingerprint as a “seed” value itself that’s fed into the model, but it’s a rather a “metaphorical hash” of the backend system configuration itself
i.e. We can express generation as:
prompt + seed → backend → completion
And from the above follows that different backend (expressed as a different system_fingerprint value) may lead to different completion, given the same prompt and seed pair as input
Also, there is very likely some amount of randomness coming out of just simple floating point calculation race conditions
when completion calculation is parallelized, some computations could potentially be made quicker and lead to tiny (~0.00…001) errors (due to finite floating point precision).
While most of the time it won’t matter, tiny fraction of cases a different token would be selected, which in turn could potentially affect all of the rest tokens due to how transformer architecture works.
Although I’m not sure how prevalent (if at all) this is
GPT-3 models were deterministic. You put in the same input, you get exactly the same embeddings and exactly the same logit values and logprobs every time. So it is not a “transformer architecture” issue.
Math is math, and barring computational error in the processor, those bits get combined in the same way every time regardless of how complex the underlying processes are.
All OpenAI models now available are indeed non-deterministic. We don’t know why. Did they turn off ECC in the GPU for efficiency? A non homogeneous mix of hardware pool. Do they purposely “selective availability” the outputs so that you can’t make stateful inspections of the underlying mechanisms? Whatever it is, you run 20 of the same embeddings or 20 of the same chat completions, you get different vectors and different logprobs almost every time, often resulting in position-switching of ranked tokens and ranked semantic search.
That fingerprint changing indicates you’re going to get different results - they added training, reweighting, or inference architecture changes, so it is essentially like pointing your job at a different model, with no changelog.
The seed is part of the sampling that comes after logit calculation and softmax, which is meant to be random. You can ask the AI to roll 1d20 at temperature 1.5, and every call gets you different results because of the random token selection from all possible. Set the seed the same and you’d always get the same result back - except for the previously described issue that reduces the quality of the token mass that is an input to the sampler.