Hi ,
I have writen the code according to the doc to check if I can get the same output by setting the “seed” param. but it seems the output still diff form requests. both the “gpt-4-1106-preview” model and “gpt-3.5-turbo” give unreproducible result in case of setting all the same input and seed.
Am I misunderstand the seed param usage ?
I can confirm that not only do seeds not work, setting the temperature to 0 isn’t producing deterministic results either, so there may be a deeper issue affecting generations.
I have the same understanding as you.
I opened an issue on their Python SDK repository (openai-python/issues/708) although the issue is on the server side.
I just checked their cookbook “deterministic_outputs_with_the_seed_parameter” again and it is mentioned that
If the seed, request parameters, and system_fingerprint all match across your requests, then model outputs will mostly be identical. There is a small chance that responses differ even when request parameters and system_fingerprint match, due to the inherent non-determinism of computers.
I try gpt-3.5-turbo-1106 for serveral times and can not get the totally same result. (the system_fingerprint are all the same). In my testing coding, the result come out that some parts like the first or maybe the second line are the same, but the rest of the result are different.
I’ve figured out that issue with not reproducing same output with defined seed is related with calculating question embeddings which tend to produce varying results. So for testing purposes to achieve comparable results you need to retrieve once and then reuse (mock) embeddings for defined set of test questions. More about this issue you can find in thread below: