The Relationship between Best of, Temperature and Top P (The Three Variable Problem)

First it’s important to recall (or clarify) a few things:

  • Completions are selected by adding tokens one at a time based on their computed likelihoods (expressed as logprobs) . Crucially this is were the randomness of generated completions comes in: token selection is not deterministic (even though the logprobs for a given prompt are).
  • Despite being fixed for a given prompt and output token, logprobs may vary slightly as a result of slight numerical inaccuracies accumulating in the numerous floating-point operations involved in computing them, so they may differ slightly between runs). This is not where the randomness of completions comes from.
  • When n or best_of are provided as arguments you’re effectively invoking the API max(best_of, n) times (with n and best_of omitted), to generate max(best_of, n) completions, from which n are selected.

With that in mind, here’s how it works: The n selected are simply the ones for which the sum of all the logprobs is highest , i.e. the most probable overall completions (not the ones with the highest logprob “per token”, as stated in the docs, whatever that would mean).

Where temperature comes in is simply that it flattens the probability distribution for output tokens, thus making less likely tokens more likely to be selected, at the expense of more likely ones.

And top_p is just an alternate way of tweaking the token probabilities, by gabbing only the most likely tokens whose total probability adds up to at least p, and then choosing from those (using a renormalized PDF just over that subset).

(Also, note — as the close reader will have discovered — that, simply as a convenience for human readability, the logprobs reported by the API are for words, not tokens.)

3 Likes