Why the API output is inconsistent even after the temperature is set to 0

_j · August 26, 2023, 6:53am

It took a good amount of futzing around, and the prompt is only happenstance from other stuff I was trying, but I have an interesting result.

If you really want to have fun with statistics, do trials on two top logit token outputs that match to 8 digits of accuracy!

"top_logprobs": [
 {
  " Aug": -2.4173014,
  " Oct": -2.4173014,
  " Mar": -2.440739,
  " Jan": -2.440739
 }
]

Aug = 8.92%
Oct = 8.92%
Jan = 8.71%
Mar = 8.71%

model: davinci-002
max_tokens: 1

"prompt": """In the square brackets are 1000 random ASCII characters, using 0-9a-zA-Z: [0-9a-zA-Z]{1000}.

share|improve this answer

edited"""

Let’s run 70 trials at multiple settings. Extract the first letter each time.

“top_p”: 0.0891, temperature 2
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

“top_p”: 0.0892, temperature 2
OOOAAOAAAOOOAAAAOOAOAAOOAOOAOAOOAOAAOAAOOOOAAAOAAAOAAOOOAAAAOOOOAAOOAO

Thus, an exact top_p threshold where the next token is allowed.

Let’s continue:

“top_p”: 0.0892, temperature=0.000000001 (very A)
OAAAAAAAAAAAAOAAAAOAAAAAAAAAAAAAAAOAAOAAOAOAAAAAAAAAOOAAAAOOOOAAOAOAAA

“top_p”: 0.0892, temperature=0.0000000001 (all A)
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

And you won’t believe if we switch from miniscule to 0, a change:

First letter results of “top_p”: 0.0892, temperature=0.0
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

Or even if we release the top_p restriction, a change again:

First letter results of “top_p”: 1.0, temperature=0.0
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

The odd thing is temperature limit or top-p limit methods converge on a different token of the two allowed depending on setting.

Are they literally tied as far as top_p is concerned so the first seen is picked, while temperature is able to put distance between the probabilities?

Topic		Replies	Views
Achieving deterministic API output on language models - HOWTO API statistics	3	7834	December 21, 2023
Is the lower the temperature, the more correct the answer is? Prompting gpt-4 , chatgpt	5	6492	March 15, 2024
Why does the answer vary for the same question asked multiple times Community api	8	1728	May 22, 2024
Observing discrepancy in completions with temperature = 0 API	9	17027	February 6, 2024
Why is GPT-4 giving different answers with same prompt & temperature=0? API	6	15999	April 6, 2023

Why the API output is inconsistent even after the temperature is set to 0

Related topics