Observing discrepancy in completions with temperature = 0

sam_nabla · February 24, 2023, 10:52am

When querying several time the /completions API with a temperature of 0, I still observe some differences in the responses. Those differences are usually subtle but can be huge for more complicated prompts. See example below for reproduction.

Screenshot

Code

import requests
import json

completions = []
for i in range(3):
    print(i)
    res = requests.post(
        "https://api.openai.com/v1/completions",
        json = {
            "prompt": "A logical",
            "model": "text-davinci-003",
            "temperature": 0.0,
            "max_tokens": 200,
        },
        headers={
            'Authorization': f'Bearer {API_KEY}'
        }
    )
    completions.append(json.loads(res.text)["choices"][0])
    
for c1 in completions:
    for c2 in completions:
        if c1["text"] != c2["text"]:
            print(c1["text"])
            print(c2["text"])
            print()

ruby_coder · February 24, 2023, 11:03am

@sam_nabla do you expect the same completions even at temp 0 for such a prompt as above using a probabilistic LLM?

sam_nabla · February 24, 2023, 12:28pm

Yes I do.

I interpret the temperature is the temperature of the softmax layer used for the sampling (the last layer of the transformer), and using a temperature of 0 will basically make it an argmax: the next token chosen is the one with the highest score. And so I expect it to be deterministic.

But maybe it’s not exactly the meaning of temperature for this API?

ruby_coder · February 24, 2023, 1:27pm

Hi @sam_nabla

I think you are right!

I ran your prompt 10 times using temp 0 and got the same completion 10 times in a row

Testing

Model: text-davinci-003, Temperature: 0, Max Tokens: 1024, Completion Reason: stop

Ran it like this with these params:

AgusPG · February 24, 2023, 2:05pm

This is a very interesting question that has been around for some time. In my view, the most comprehensive answer was given here: A question on determinism

Even though you are right in your hypothesis @sam_nabla, it doesn’t hold empirically. Even with a greedy decoding strategy, small discrepancies regarding floating point operations lead to divergent generations. In simpler terms: when the top-two tokens have very similar log-probs, there’s a non-zero probability of choosing the least probable one due to the finite number of digits that you’re using for multiplying probs and storing them.

It should also be noted that, as the decoding occurs in an autoregressive way, once you have picked a different token the whole generated sequence will diverge, as this choice affects to the probability of generating every subsequent token.

Hope that helps

sam_nabla · February 24, 2023, 3:23pm

Thanks @AgusPG for the link, it’s a very clear answer to my question.

raymonddavey · February 24, 2023, 5:35pm

This is one of those very rare situations where you may also like to play with top_p

This will reduce the pool of choices and spread the probability for each remaining choice. It may make a difference in your case.

DaTruAndi · September 20, 2023, 2:07pm

(EDITED)
IMHO the response of AgusPG seems funky.
For the same prompt with the same settings (temp=0/ top_p only the top result). On the same hardware with temperature 0, where in the GPT process would it have a chance to have a different floating point result? The processing would happen in the same order, there is no randomized solver involved anymore.
Or does that imply - if I follow the links further - that the server infrastructure mixes and matches different GPU types, drivers, …?

Foxalabs · September 20, 2023, 3:25pm

It’s possible, but I think there is more probability that a temperature of 0 is just very close to deterministic but not quite actually 0, maybe to avoid a division by zero issue, so only occasionally do you get a token flip and then a different sequence after that point.

Topic		Replies	Views
Why is GPT-4 giving different answers with same prompt & temperature=0? API	6	16111	April 6, 2023
Why the API output is inconsistent even after the temperature is set to 0 API gpt-4	11	22081	December 21, 2023
Run same query many times - different results API	11	7755	December 21, 2023
ChatCompletions are not deterministic even with seed set, temperature=0, top_p=0, n=1 API gpt-4 , api	9	1592	October 7, 2024
Possible bug? gpt-3.5-turbo non-deterministic even with temperature zero API	4	4574	December 21, 2023

Observing discrepancy in completions with temperature = 0

Testing

Ran it like this with these params:

Related topics