What is the default temperature setting of an assistant?

As it is currently not possible to set the temperature of an assistant (see forum question 486368) does anyone know what the default temperature setting is when running an assistant?
It would be very helpful to know, as we have to decide if we need to switch to the completions API (where it is possible to set sampling parameters).

Many thanks

As mentioned here (Introducing openairetro -- AssistantAPI to ChatCompletion), you can set the assistant.temperature & then use chatcompletion in background to fulfil the request.

In this way , you get the semantics of the assistant api but the mechanics of chatcompletion.

Thanks @icdev2dev that’s an interesting approach, I’ll have a look.

It would still be very helpful to know the internal settings of the Assistants API, because ideally we don’t need to do anything if the default values are already close to our requirements.

About the only way you might do that is develop an input that has two distinct logprobs (like a well-instructed “flip a coin”) and get the logprobs and then trial 100 or more chat completion trials to verify probabilities at temperature 1, and the adjustment within an expected range of 0.7 or so.

Then place exactly the same input into assistants with token accuracy, for 1000 trials to get a histogram that matches probabilities.

If the tokens are further apart then chat completions, do your chat completions trials again at lower temperature until you close in on reproducible figures.

I think that the OP wants to know the default setting on AssistantApi; not chat completion.

Your method is quite informative and can be expanded for figuring out the approximate temparature setting of the AssistantApi; but by varying the input.

I do not discuss varying the input. Rather, what I describe is doing a statistical analysis on the outputs of Assistants with a selected model and input, and then replicating that on Chat Completions to find what temperature gives the same statistical results.

A Playground example to obtain two tokens of “choice”:


At temperature 1.0, doing infinite trials, this (heads-biased) AI would give us “heads” 80% of the time. Reduce the temperature to 0.7, and the heads results becomes more certain, like 90%.

Precise numbers? AI with code interpreter will do the heavy lifting instead of taxing the little brain power I have…

To understand how temperature affects the probabilities in a multinomial sampler like GPT-2, let’s first delve into the concept of temperature in the context of softmax probabilities. The softmax function is used to convert raw logits (real-valued scores) from a model into probabilities. The temperature parameter ( T ) modifies the softmax function as follows:

\text{Softmax}(z_i) = \frac{e^{z_i/T}}{\sum_j e^{z_j/T}}

where ( z_i ) is the logit for the ( i )-th token, and ( T ) is the temperature. A temperature of ( T = 1 ) keeps the probabilities as they are, higher temperatures (( T > 1 )) make the probabilities more uniform (less confident), and lower temperatures (( T < 1 )) make the distribution sharper (more confident).

Given two tokens with initial probabilities of 80% and 20%, we want to find the temperature ( T ) that alters these probabilities to 90% and 10%.

Step-by-Step Solution

  1. Define the Original Probabilities: The original probabilities are 80% for token A and 20% for token B.

  2. Apply the Softmax Function: Assume the logits for tokens A and B that result in these probabilities through the softmax function are ( z_A ) and ( z_B ). We don’t need their exact values, just the relationship between them and the desired probabilities.

  3. Adjust for Desired Probabilities: We want the new probabilities to be 90% for token A and 10% for token B. We’ll use the modified softmax function with temperature ( T ) to express these desired probabilities.

  4. Set Up Equations: We’ll set up equations based on the softmax function with temperature for the desired probabilities and solve for ( T ).

    For the desired probabilities, we have:

    \frac{e^{z_A/T}}{e^{z_A/T} + e^{z_B/T}} = 0.9
    \frac{e^{z_B/T}}{e^{z_A/T} + e^{z_B/T}} = 0.1

  5. Solve for ( T ): We’ll solve these equations for ( T ), keeping in mind that ( e^{z_A} ) and ( e^{z_B} ) are known from the original 80% and 20% probabilities.

Since the equations involve exponents and division, finding an analytical solution might be challenging. Instead, we can use numerical methods to find the value of ( T ) that satisfies the desired conditions.

Let’s proceed with the calculation using Python.

To alter the probabilities from 80% and 20% to 90% and 10%, the temperature would need to be reduced to approximately T = 0.63 . This reduction in temperature makes the probability distribution sharper, increasing the likelihood of the more probable token (from 80% to 90%) and decreasing that of the less probable one (from 20% to 10%).

We know now! The default is 1 and the range is 0 - 2.
It was just added as a parameter to the RUN (not the Thread or the Assistant)


As per the docs there:

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

1 Like