Question about the Use of Seed Parameter and Deterministic Outputs

Hello everyone,

I have a couple of questions regarding the use of the seed parameter in API calls and achieving deterministic outputs:

  1. Use of Seed Parameter: I understand that the seed parameter is used to produce reproducible results. When exactly is this seed used during the response generation process? Also, what are the differences between seed, temperature, and top_p in terms of their roles and impacts on the output?

  2. Deterministic Outputs: I’ve noticed that some posts on the forum mention that even when the seed is fixed and both top_p and temperature are set to 0, the generated responses still vary. Is it truly impossible to achieve completely deterministic results in the current situation?

I would appreciate any insights or explanations on these points. Thank you!

It is truly impossible to achieve completely deterministic results in the current situation.

There are few situations where you’d need the same input to produce the exact same sequence. Humans are easily fooled even by a run of all second-place tokens of language.

The output of language AI is not simply producing the best-predicted word one-at-a-time. The human-like writing quality seems to increase when the generation can stray into new territory. (It also is an antidote to language model flaws, like going into loops of repeats)

This is done by sampling.

The AI has 100000 (or 200000) tokens that it can make as its product. The inference gives each of them a score. The total scores are combined into a normalized probability distribution, where the sum of all certainties = 1.0, or 100%.

Then the selection of the token is done randomly, based on probabilities.

This pseudorandom algorithm has a seed value. Provide the same seed, and every time you roll the dice you get the same results as the previous dice-rolling session.

If the inference itself was not flawed, the same seed would give the same tokens for the same input, even with high temperature to make very unlikely phrases. However, the OpenAI language models now do not output identical token scores between runs. That makes the seed somewhat useless.

Paste from yesterday:

Top-p is performed first. When it is set under 1.0, the least probable tokens in the tail of probability space are eliminated. 0.9 as a setting would allow only those that occupy the top 90% of probability mass.

Temperature then is a reweighting, where reducing the value increases the chance of the most likely, and a high number can instead make less-likely values more probable.

3 Likes

Great explanation, thanks!

1 Like

If it is truly impossible then OpenAI and Microsoft need to update their docs as there are mulitple official guides that assert that setting the seed to a fixed number wil give you deterministic results, for a given system finger print. Unfortunately recent testing for me at least has failed to return any fingerprint at all so it looks like a regression and sadly again there seems to be no way to raise a bug against the API (it’s independent of SDK).

1 Like