_j
October 19, 2025, 7:06pm
2
Seed, if it came out when gpt-3 was in effect or on completions, would allow you to use higher (default) temperature and have repeatable outputs for identical inputs. When the models themselves produce different token prediction values every run, it is quite pointless, and was delivered on DevDay 2023 as a false promise along with ‘fingerprint’, exposed as a waste of time shortly after.
The 'best" is top_p of 1e-6, as this makes any conceivable distribution only output the top rank of a run. Larger models generally will have lower perplexity, but this depends on the task and post-training - whether they can be monotonous and overfitted.
The AI has 100000 (or 200000) tokens that it can make as its product. The inference gives each of them a score. The total scores are combined into a normalized probability distribution, where the sum of all certainties = 1.0, or 100%.
Then the selection of the token is done randomly, based on probabilities.
This pseudorandom algorithm has a seed value. Provide the same seed, and every time you roll the dice you get the same results as the previous dice-rolling session.
If the inference itself was not flawed, the same seed would give the same tokens for the same input, even with high temperature to make very unlikely phrases. However, the OpenAI language models now do not output identical token scores between runs. That makes the seed somewhat useless.
All OpenAI models now available are indeed non-deterministic. We don’t know why. … Whatever it is, you run 20 of the same embeddings or 20 of the same chat completions, you get different vectors and different logprobs almost every time, often resulting in position-switching of ranked tokens and ranked semantic search.
That fingerprint changing indicates you’re going to get different results - they added training, reweighting, or inference architecture changes, so it is essentially like pointing your job at a different model, with no changelog.
The seed is part of the sampling that comes after logit calculation and softmax, which is meant to be random. You can ask the AI to roll 1d20 at temperature 1.5, and every call gets you different results because of the random token selection from all possible. Set the seed the same and you’d always get the same result back - except for the previously described issue that reduces the quality of the token mass that is an input to the sampler.
The unreliable tokens can also destroy the usefulness of seed. You can’t repeat the particular token that was randomly selected if the token logprobs are different each time.
The multinomial sampler would be provided a dictionary of logprobs, and then try to repeat its choice. However, if the probability space occupied by tokens is different, a different token choice may be made even if the same random threshold (“seed”) that selects a token is repeated.
2 Likes