Temperature and top_p interactions?

They can be altered separately. You just have to understand what they do.

top_p is first. It limits the logits that can be selected from. 0.90 = top 90% inclusive of tokens are valid, and for the following unambiguous case, only the top-1 token could be selected:

image

Then temperature is a divisor of the logprob, where a number smaller than 1.0 increases the distance between the top and likely tokens to make them more likely to be sampled from.

They can be used together, for example, here’s an ambiguous word choice. A top_p of 0.15 would limit to just the top 5 seen. A high temperature could then make the choices nearly equal in probability instead of “people” dominating the results.

image

Top-p is good for cutting off the unpredictable tail of tokens that would serve to confuse and break formats.

2 Likes