Temperature and top_p interactions?

Hello,

OpenAI “generally” recommends altering “temperature” or “top_p”, but not both.

This suggests that one can alter “temperature”, but then should leave “top_p” on its default value, and vice versa.

I would like to understand why these two parameters should NOT be altered at the same time and what happens if they are?

Thank you.

Martin

They can be altered separately. You just have to understand what they do.

top_p is first. It limits the logits that can be selected from. 0.90 = top 90% inclusive of tokens are valid, and for the following unambiguous case, only the top-1 token could be selected:

image

Then temperature is a divisor of the logprob, where a number smaller than 1.0 increases the distance between the top and likely tokens to make them more likely to be sampled from.

They can be used together, for example, here’s an ambiguous word choice. A top_p of 0.15 would limit to just the top 5 seen. A high temperature could then make the choices nearly equal in probability instead of “people” dominating the results.

image

Top-p is good for cutting off the unpredictable tail of tokens that would serve to confuse and break formats.

1 Like

Would you agree with this simpler formulation?

Temperature is a scaling factor and pushes probabilities up across the board, and top_p controls your probability cutoff.

If you crank up the temperature to raise the token probabilities across the board by a certain amount, but adjust top_p proportionally so that the same tokens get sampled as before, it’s the same as if you didn’t do anything at all.

temperature 0, top_p 1 should have the same output as temperature 2, top_p 0.

1 Like

Setting either parameter to 0 is an invalid number, so OpenAI just uses a small value as a placeholder if you specify that. It’s not as small as a smaller number you can enter yourself, such as 1e-9, and this symptom can be evidenced in sampling surveys.

They are not the same and cannot be compared.

temperature:0.01 still allows lottery-winning odd tokens through, but the results are just highly weighted towards the top.
top_p: 0.01 allows only the top 1% of tokens through, which would only be one token possible even in some of the most unpredictable of circumstances:

image

1 Like