Questions regarding API sampling parameters (temperature, top_p)

simon.h · August 28, 2023, 5:40pm

Hi. I have a few questions regarding the API parameters which determine the sampling of the tokens. In production I use gpt-3.5-turbo exclusively. However to access token probabilities I sometimes use text-davinci-003 in the playground. I’d be very happy if someone can help me understand a little better what is happening.

Default behavior.
If I set top_p and temperature to 1 (default values), then the sampling will just pick a random token according to the probability. So if there are two tokens, token1 with 79% and token2 with 20% and the rest of the tokens share the last percentage and I do the same completions multiple times, then I’d (roughly) expect to see token1 4 out of 5 times and token2 1 out of every 5 completions. Is this correct?
The actual temperature value.
The temperature value in the API seems not to be the value actually used for the (re)assignment of probabilities. To my understanding if you use temperature the probability of every token is modified by ^(1/temperature_value) and then normalized again by softmax. Alternatively you apply the temperature directly when normalizing but this doesn’t seem to be the case since the probabilities displayed in the playground are not sensitive to temperature, so they’ve to be calculated prior to it’s application. Essentially a temperature between 0 and 1 should increase the probability of already probable tokens and a temperature >1 increases the probability of formerly less probable tokens. Very high temperature values lead to a nearly uniform probability distribution over the tokens, so basically you choose a random token.
If you use the temperature value in the API and set it to 2 you seem to get just random tokens. In the playground you can even observe this directly (using text-davinci-003). Set the temperature to 2 and the output contains tokens with logprob of -15 and lower. If the temperature is applied as described above (or similarly) it must be higher than 2 or otherwise it wouldn’t cause this level of randomness. Basically I’d like to understand which functions modify the API parameter to get to the temperature values which are actually used or at least how the temperature parameter of the API scales (it doesn’t seem to scale linearly).
Order if both parameters are modified.
From what I observed in the playground it looks like the top_p parameter is applied first if both parameters are used. So if I choose top_p=0.99 and temperature 1.7, then top_p is applied first (basically throwing out nearly all of the tokens) and then the probability of the remaining tokens is recalculated based on the temperature. Since the order of application makes a big difference concerning the tokens which can be sampled at all this would be super interesting. Can somebody confirm that top_p is applied first?
Setting one of the values to 0.
If either top_p=0 or temperature=0 are used the model will just output the most probable token, is this correct?

Looking forward to your answers. Thank you!

Topic		Replies	Views
Cheat Sheet: Mastering Temperature and Top_p in ChatGPT API API	36	280693	January 29, 2024
Why the API output is inconsistent even after the temperature is set to 0 API gpt-4	11	24368	December 21, 2023
How do I design an effective question? API	13	727	February 25, 2024
The Relationship between Best of, Temperature and Top P (The Three Variable Problem) Prompting	10	4767	April 14, 2023
A better explanation of "Top P"? Prompting	10	129321	December 12, 2023

Questions regarding API sampling parameters (temperature, top_p)

Related topics