I’ve been trying to get GPT-4o via the Chat Completion API to give the same response given the same API arguments without success. Setting a seed and temperature to 0 has resulted in slightly more deterministic behavior but large amounts of randomness still exist.
Experiment: 10 API calls to GPT-4o with Same Arguments
Results: 5 unique responses
1x response repeated 4 times
1x response repeated 3 times
3x responses repeated 1 time
The system fingerprint was the same for all 10 API calls.
Here are the arguments I passed to the chat completion endpoint for this experiment:
It is possible to get the same response when the same response has a clearly correct answer and one way to write it.
You are correct, though. No parameter can produce truly repeatable results. Not even discarding or binning by fingerprint on calls made within the same minute.
Receive logprobs. You will be able to see the variations between calls in the values.
No explanation or justification of the cause or reason of non-determinism in generation has ever been offered. A seed parameter cannot prevail against this. The last model that was deterministic was text-davinci-003.
If you are sending the same messages input and want the same thing you know you received before, you can just hash and cache.
The nature of GPU clusters and batching of API jobs means that variation can occur between runs, to get full deterministic behaviour the GPUs would need to be synchronised, and to do that would mean running at lower clock speeds reducing performance.
The upshot of this is that the models will always have some degree of variation in output across calls.