Why does the answer vary for the same question asked multiple times

_j · May 22, 2024, 2:06am

If you were to actually examine the logits, the logprobs after softmax, you would have seen on GPT-3 models that the results were always the same. Something about the optimization of OpenAI models after or the hardware it runs on produces some variance in the output values between runs, a fraction of a percentage when examining the top probabilities.

Perhaps what you are wondering, though, is why you get significantly different responses each time.

That is due to token sampling.

The result of language inference is a certainty score assigned to each token in the the model language dictionary (token encoder). One could simply pick the top result for every token that is generated. However, it was discovered that such an output actually isn’t very natural or human.

Instead, the total scores are combined into a normalized probability distribution, where the sum of all certainties = 1.0, or 100%. Imagine a roulette wheel where the slot for “The” is wide because it is well predicted for a generation, while the token “zzarella” is a poor way to start a sentence, and gets an infinitesimal sliver of chance.

Thus in any trial you have words appearing with direct relation to the model’s prediction of likelihood at that position.

The direct correlation of certainty to probability can be altered with the sampling parameters top_p, and temperature.

Top-p is performed first. When it is set under 1.0, the least probable tokens in the tail of probability space are eliminated. 0.9 would allow only those that occupy the top 90% of probability mass.

Temperature then is a weighting, where reducing the value increases the mass of the most likely, and a high number can make values more equal.

In this manner, you can get a creative-sounding but not robotic AI, and with your own use of the sampling parameters, you can reduce some unlikely choices.

Topic		Replies	Views
Why is GPT-4 giving different answers with same prompt & temperature=0? API	6	15832	April 6, 2023
Run same query many times - different results API	11	7591	December 21, 2023
Why the API output is inconsistent even after the temperature is set to 0 API gpt-4	11	21228	December 21, 2023
Logprobs inconsistent between runs for 4o API logprobs	4	683	September 11, 2024
Achieving deterministic API output on language models - HOWTO API statistics	3	7637	December 21, 2023

Why does the answer vary for the same question asked multiple times

Related topics