GPT-3 models were deterministic. You put in the same input, you get exactly the same embeddings and exactly the same logit values and logprobs every time. So it is not a “transformer architecture” issue.
Math is math, and barring computational error in the processor, those bits get combined in the same way every time regardless of how complex the underlying processes are.
All OpenAI models now available are indeed non-deterministic. We don’t know why. Did they turn off ECC in the GPU for efficiency? A non homogeneous mix of hardware pool. Do they purposely “selective availability” the outputs so that you can’t make stateful inspections of the underlying mechanisms? Whatever it is, you run 20 of the same embeddings or 20 of the same chat completions, you get different vectors and different logprobs almost every time, often resulting in position-switching of ranked tokens and ranked semantic search.
That fingerprint changing indicates you’re going to get different results - they added training, reweighting, or inference architecture changes, so it is essentially like pointing your job at a different model, with no changelog.
The seed is part of the sampling that comes after logit calculation and softmax, which is meant to be random. You can ask the AI to roll 1d20 at temperature 1.5, and every call gets you different results because of the random token selection from all possible. Set the seed the same and you’d always get the same result back - except for the previously described issue that reduces the quality of the token mass that is an input to the sampler.