Deterministically different responses when calling 3.5-turbo from different locations

theofficialjeffan · December 9, 2023, 7:48am

I’m noticing a weird issue where with the same prompt, same seed, and temperature = 0, I’m getting different prompt completions when I send the request to GPT 3.5 turbo from 2 different locations.

The first location is my laptop, the second is a Github Actions virtual machine (which maybe lives in Azure and so might have some special connection to gpt?).

What’s weird is responses from both locations are also deterministic…and yet different from each other. I’ve hashed the input prompt to ensure the inputs are equal, and also made sure the system fingerprints are equal from both locations as well.

Can anyone explain this?

_j · December 9, 2023, 9:09am

The next thought I have is libraries, and logging what is actually being sent to the AI.

Python has client.chat.completions.with_raw_response.create(), delivering an APIresponse object that lets you retrieve the httpx request itself.

If the geography routes you to datacenters, one might suppose that there could be something different about the system random method where seed doesn’t take the same effect. An out-there theory. You can instead try top-p=0.0000001 without temperature or seed to force an answer as deterministic as possible.

moonlockwood · December 9, 2023, 10:03am

I definitely cannot explain this. However… Azure instances will be on slightly different hardware and with slightly different systems generating and feeding the tokens in.

I do a lot of testing with small neural networks and I have noticed (confirmed) that running the exact same nn on different gpus alters the outputs. There’s a lot going on in the tails of the vectors (as you can see when you quantize llms) so floating point accuracy will affect how an nn processes data. This is not computing as we have experienced it in the past, it’s all probabilities so variability is inherent. My mind has come to think of it as ‘soft’ computing/programming. Soft as in squishy, and traditional programming (everything up until now) is ‘hard’ - when you ask for a zero, you get a zero. No ifs ands or buts - unless you made a mistake. That’s not how this stuff is in my experience.

Topic		Replies	Views
Observing discrepancy in completions with temperature = 0 API	9	17372	February 6, 2024
Run same query many times - different results API	11	7926	December 21, 2023
ChatCompletions are not deterministic even with seed set, temperature=0, top_p=0, n=1 API gpt-4 , api	9	1688	October 7, 2024
Non-deterministic probabilities for first generated token in chat.completion? API	4	850	April 24, 2024
Parallel API Calls for the Same User Query Result in Inconsistent Responses API api	3	285	October 17, 2024

Deterministically different responses when calling 3.5-turbo from different locations

Related topics