Why is GPT-4 giving different answers with same prompt & temperature=0?

kavya98 · April 6, 2023, 5:27am

This is my code for calling the gpt-4 model:

messages = [
    {"role": "system", "content": system_msg},
    {"role": "user", "content": req}
]

response = openai.ChatCompletion.create(
        engine = "******-gpt-4-32k",
        messages = messages,
        temperature=0,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )

answer = response["choices"][0]["message"]["content"]

Keeping system_msg & req constant, with temperature=0, I get different answers. I got 3 different answers when I last ran this 10 times for instance. The answers are similar in concept, but differ in semantics.

I was expecting the exact same answer every time. Why is this happening?

raymonddavey · April 6, 2023, 5:29am

You can try to reduce top_p as well as temperature. It will reduce the word choices even further.

kavya98 · April 6, 2023, 5:36am

But why is the top choice changing every time gpt-4 model is called again?

raymonddavey · April 6, 2023, 6:10am

You may have two words with the same score in the top_p list of word/token choices.

In really simplistic terms, temperature tells it what percent of the top words it can pick the next word from - based in the sum of their probabilities. (This is not 100% correct, but helps explain the next part)

A temperature of zero tells it to pick the top word from the top_p lost. But the list of words could be huge and some words may be at the top of the list with the same probability score - especially short and common words.

What top_p does, is it reduces the list of word choices. So 1 means the list has every word possible, ranked high probability to lowest.

A top_p of 0 basically tells the ai to throw away all the words except the very top ones - so temperature has less to pick from. Even a value of 0.1 will make a big difference

By reducing the number of words on the list, the temperature has less to pick from.

So try a temperature of 0 and a top_p of 0.1 and see if it makes a difference

No harm in trying…

Note: I used “words” in my description above to make it easier to explain - but it is actually “tokens” that are being ranked two different words could start with the same token.

kavya98 · April 6, 2023, 8:15am

Thanks for your detailed reply!

I tried again with top_p = 0.1
Don’t see any difference though.

What you said made sense to me for temp > 0, that it’ll have less options with lower values of top_p. But since at temp=0 it simply picks the highest one, I don’t understand how changing top_p would have a big difference in this case.

I’m still seeing 3 different answers if I run it 10 times.

raymonddavey · April 6, 2023, 8:21am

Sorry it didn’t work. I was hoping it would make it even more rigid in it’s response

AgusPG · April 6, 2023, 8:26am

GPT generation process is non-deterministic by default. You can see a more thorough discussion on this on this thread (and its associated link). TL;DR: the problem is that the token with “highest probability” is ill-defined due to the finite number of digits that you’re using for multiplying probs and storing them. Hope it helps

Topic		Replies	Views
Why does the answer vary for the same question asked multiple times Community api	8	2547	May 22, 2024
Observing discrepancy in completions with temperature = 0 API	9	17647	February 6, 2024
I get different answers to the same request API gpt-4 , gpt-35-turbo , chatgpt , api	2	5391	December 8, 2023
Run same query many times - different results API	11	8162	December 21, 2023
Why the API output is inconsistent even after the temperature is set to 0 API gpt-4	11	24091	December 21, 2023

Why is GPT-4 giving different answers with same prompt & temperature=0?

Related topics