I do not discuss varying the input. Rather, what I describe is doing a statistical analysis on the outputs of Assistants with a selected model and input, and then replicating that on Chat Completions to find what temperature gives the same statistical results.
A Playground example to obtain two tokens of “choice”:
At temperature 1.0, doing infinite trials, this (heads-biased) AI would give us “heads” 80% of the time. Reduce the temperature to 0.7, and the heads results becomes more certain, like 90%.
Precise numbers? AI with code interpreter will do the heavy lifting instead of taxing the little brain power I have…
To understand how temperature affects the probabilities in a multinomial sampler like GPT-2, let’s first delve into the concept of temperature in the context of softmax probabilities. The softmax function is used to convert raw logits (real-valued scores) from a model into probabilities. The temperature parameter ( T ) modifies the softmax function as follows:
\text{Softmax}(z_i) = \frac{e^{z_i/T}}{\sum_j e^{z_j/T}}
where ( z_i ) is the logit for the ( i )-th token, and ( T ) is the temperature. A temperature of ( T = 1 ) keeps the probabilities as they are, higher temperatures (( T > 1 )) make the probabilities more uniform (less confident), and lower temperatures (( T < 1 )) make the distribution sharper (more confident).
Given two tokens with initial probabilities of 80% and 20%, we want to find the temperature ( T ) that alters these probabilities to 90% and 10%.
Step-by-Step Solution
-
Define the Original Probabilities: The original probabilities are 80% for token A and 20% for token B.
-
Apply the Softmax Function: Assume the logits for tokens A and B that result in these probabilities through the softmax function are ( z_A ) and ( z_B ). We don’t need their exact values, just the relationship between them and the desired probabilities.
-
Adjust for Desired Probabilities: We want the new probabilities to be 90% for token A and 10% for token B. We’ll use the modified softmax function with temperature ( T ) to express these desired probabilities.
-
Set Up Equations: We’ll set up equations based on the softmax function with temperature for the desired probabilities and solve for ( T ).
For the desired probabilities, we have:
\frac{e^{z_A/T}}{e^{z_A/T} + e^{z_B/T}} = 0.9
\frac{e^{z_B/T}}{e^{z_A/T} + e^{z_B/T}} = 0.1
-
Solve for ( T ): We’ll solve these equations for ( T ), keeping in mind that ( e^{z_A} ) and ( e^{z_B} ) are known from the original 80% and 20% probabilities.
Since the equations involve exponents and division, finding an analytical solution might be challenging. Instead, we can use numerical methods to find the value of ( T ) that satisfies the desired conditions.
Let’s proceed with the calculation using Python.
To alter the probabilities from 80% and 20% to 90% and 10%, the temperature would need to be reduced to approximately T = 0.63 . This reduction in temperature makes the probability distribution sharper, increasing the likelihood of the more probable token (from 80% to 90%) and decreasing that of the less probable one (from 20% to 10%).