It is truly impossible to achieve completely deterministic results in the current situation.
There are few situations where you’d need the same input to produce the exact same sequence. Humans are easily fooled even by a run of all second-place tokens of language.
The output of language AI is not simply producing the best-predicted word one-at-a-time. The human-like writing quality seems to increase when the generation can stray into new territory. (It also is an antidote to language model flaws, like going into loops of repeats)
This is done by sampling.
The AI has 100000 (or 200000) tokens that it can make as its product. The inference gives each of them a score. The total scores are combined into a normalized probability distribution, where the sum of all certainties = 1.0, or 100%.
Then the selection of the token is done randomly, based on probabilities.
This pseudorandom algorithm has a seed value. Provide the same seed, and every time you roll the dice you get the same results as the previous dice-rolling session.
If the inference itself was not flawed, the same seed would give the same tokens for the same input, even with high temperature to make very unlikely phrases. However, the OpenAI language models now do not output identical token scores between runs. That makes the seed somewhat useless.
Paste from yesterday:
Top-p is performed first. When it is set under 1.0, the least probable tokens in the tail of probability space are eliminated. 0.9 as a setting would allow only those that occupy the top 90% of probability mass.
Temperature then is a reweighting, where reducing the value increases the chance of the most likely, and a high number can instead make less-likely values more probable.