Deterministic Results Impossible for GPT-4o

I’ve been trying to get GPT-4o via the Chat Completion API to give the same response given the same API arguments without success. Setting a seed and temperature to 0 has resulted in slightly more deterministic behavior but large amounts of randomness still exist.

Experiment: 10 API calls to GPT-4o with Same Arguments

Results: 5 unique responses

  • 1x response repeated 4 times
  • 1x response repeated 3 times
  • 3x responses repeated 1 time

The system fingerprint was the same for all 10 API calls.

Here are the arguments I passed to the chat completion endpoint for this experiment:

{
  "model":"gpt-4o-2024-08-06",
  "messages": ["same message every time"],
  "temperature":0,
  "top_p":null,
  "max_tokens":2000,
  "stream":true,
  "seed":1
}

Is it impossible to get GPT-4o via the Chat Completion API to return the same response give the same API arguments?

1 Like

It is possible to get the same response when the same response has a clearly correct answer and one way to write it.

You are correct, though. No parameter can produce truly repeatable results. Not even discarding or binning by fingerprint on calls made within the same minute.

Receive logprobs. You will be able to see the variations between calls in the values.

No explanation or justification of the cause or reason of non-determinism in generation has ever been offered. A seed parameter cannot prevail against this. The last model that was deterministic was text-davinci-003.

If you are sending the same messages input and want the same thing you know you received before, you can just hash and cache.

2 Likes

Hi,

The nature of GPU clusters and batching of API jobs means that variation can occur between runs, to get full deterministic behaviour the GPUs would need to be synchronised, and to do that would mean running at lower clock speeds reducing performance.

The upshot of this is that the models will always have some degree of variation in output across calls.

2 Likes

ok, so would this be a work around for you or is saving a response not possible?

if same_prompt_1
return saved_output_1

would also probably be cheaper, or am I missing something?

Appreciate the suggestion but I’m looking for ways to make LLMs deterministic in their response, not a hash and cache approach

Perhaps you could go one layer out and describe the broader problem you are trying to solve?

Why do you feel determinism is necessary?

1 Like

(not even my local llama3.2 appears to be deterministic even with temp 0 and top_p 0.9)