Seed param and reproducible output do not work

_j · March 10, 2024, 2:37pm

That is not what the fingerprint is for.

It indicates an AI model subversion or revision.

For applications that require highest deterministic output, being notified that that OpenAI has (otherwise stealthfully) updated the model and the type of output that may be generated is useful information to capture.

Elijas · March 10, 2024, 2:51pm

That is not what the fingerprint is for.

Thanks for the quick reply! Let me express myself a little clearer.

From official OpenAI docs:

system_fingerprint This fingerprint represents the backend configuration that the model runs with. Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.

I did not mean system_fingerprint as a “seed” value itself that’s fed into the model, but it’s a rather a “metaphorical hash” of the backend system configuration itself

i.e. We can express generation as:

prompt + seed → backend → completion

And from the above follows that different backend (expressed as a different system_fingerprint value) may lead to different completion, given the same prompt and seed pair as input

Elijas · March 10, 2024, 2:55pm

Also, there is very likely some amount of randomness coming out of just simple floating point calculation race conditions

when completion calculation is parallelized, some computations could potentially be made quicker and lead to tiny (~0.00…001) errors (due to finite floating point precision).

While most of the time it won’t matter, tiny fraction of cases a different token would be selected, which in turn could potentially affect all of the rest tokens due to how transformer architecture works.

Although I’m not sure how prevalent (if at all) this is

_j · March 10, 2024, 3:08pm

GPT-3 models were deterministic. You put in the same input, you get exactly the same embeddings and exactly the same logit values and logprobs every time. So it is not a “transformer architecture” issue.

Math is math, and barring computational error in the processor, those bits get combined in the same way every time regardless of how complex the underlying processes are.

All OpenAI models now available are indeed non-deterministic. We don’t know why. Did they turn off ECC in the GPU for efficiency? A non homogeneous mix of hardware pool. Do they purposely “selective availability” the outputs so that you can’t make stateful inspections of the underlying mechanisms? Whatever it is, you run 20 of the same embeddings or 20 of the same chat completions, you get different vectors and different logprobs almost every time, often resulting in position-switching of ranked tokens and ranked semantic search.

That fingerprint changing indicates you’re going to get different results - they added training, reweighting, or inference architecture changes, so it is essentially like pointing your job at a different model, with no changelog.

The seed is part of the sampling that comes after logit calculation and softmax, which is meant to be random. You can ask the AI to roll 1d20 at temperature 1.5, and every call gets you different results because of the random token selection from all possible. Set the seed the same and you’d always get the same result back - except for the previously described issue that reduces the quality of the token mass that is an input to the sampler.

Elijas · March 10, 2024, 7:11pm

You explained it well, thank you! The 1d20 example makes sense.

Topic		Replies	Views
The seed inference parameter for reproducibility API	5	6270	December 13, 2023
Coping with inconsistent results on identical inputs API gpt-4	20	3283	January 7, 2024
ChatCompletions are not deterministic even with seed set, temperature=0, top_p=0, n=1 API gpt-4 , api	9	1183	October 7, 2024
The "seed" option for GPT does not increase the determinism level API gpt-35-turbo	5	9374	December 13, 2023
How can i reproduce chat completions? API gpt-4	11	2921	January 26, 2024

Seed param and reproducible output do not work

Related topics