Questions regarding API sampling parameters (temperature, top_p)

Hi. I have a few questions regarding the API parameters which determine the sampling of the tokens. In production I use gpt-3.5-turbo exclusively. However to access token probabilities I sometimes use text-davinci-003 in the playground. I’d be very happy if someone can help me understand a little better what is happening.

  1. Default behavior.
    If I set top_p and temperature to 1 (default values), then the sampling will just pick a random token according to the probability. So if there are two tokens, token1 with 79% and token2 with 20% and the rest of the tokens share the last percentage and I do the same completions multiple times, then I’d (roughly) expect to see token1 4 out of 5 times and token2 1 out of every 5 completions. Is this correct?
  2. The actual temperature value.
    The temperature value in the API seems not to be the value actually used for the (re)assignment of probabilities. To my understanding if you use temperature the probability of every token is modified by ^(1/temperature_value) and then normalized again by softmax. Alternatively you apply the temperature directly when normalizing but this doesn’t seem to be the case since the probabilities displayed in the playground are not sensitive to temperature, so they’ve to be calculated prior to it’s application. Essentially a temperature between 0 and 1 should increase the probability of already probable tokens and a temperature >1 increases the probability of formerly less probable tokens. Very high temperature values lead to a nearly uniform probability distribution over the tokens, so basically you choose a random token.
    If you use the temperature value in the API and set it to 2 you seem to get just random tokens. In the playground you can even observe this directly (using text-davinci-003). Set the temperature to 2 and the output contains tokens with logprob of -15 and lower. If the temperature is applied as described above (or similarly) it must be higher than 2 or otherwise it wouldn’t cause this level of randomness. Basically I’d like to understand which functions modify the API parameter to get to the temperature values which are actually used or at least how the temperature parameter of the API scales (it doesn’t seem to scale linearly).
  3. Order if both parameters are modified.
    From what I observed in the playground it looks like the top_p parameter is applied first if both parameters are used. So if I choose top_p=0.99 and temperature 1.7, then top_p is applied first (basically throwing out nearly all of the tokens) and then the probability of the remaining tokens is recalculated based on the temperature. Since the order of application makes a big difference concerning the tokens which can be sampled at all this would be super interesting. Can somebody confirm that top_p is applied first?
  4. Setting one of the values to 0.
    If either top_p=0 or temperature=0 are used the model will just output the most probable token, is this correct?

Looking forward to your answers. Thank you!

2 Likes