I’m currently working on a project that requires generating 100% reproducible outputs from OpenAI’s GPT-4 model for the same input prompt. Despite experimenting with various parameters like temperature, top_p, max_tokens, and setting a seed, I have not been able to achieve complete determinism.
Despite these efforts, there are still slight variations in the outputs. I understand that some level of stochasticity is inherent in these models, but I’m looking for any additional tips, tricks, or best practices that might help me achieve more consistent results.
Has anyone in the community successfully managed to get fully deterministic outputs from GPT-4? If so, could you please share your approach or any insights that might help?
This, unfortunately, isn’t really possible at this time.
The best you can do is set
temperature = 0
and a constant seed value.
This will give you as consistent of results as is possible.
Under the assumption that with a seed and zero-temperature the responses will be as deterministic as is possible, you can take multiple samples of each generation. If you set something like n = 13, then if you have two possible outcomes which occur with a frequency of 90% and 10%, the probability of getting 7-or-more of the less likely outcome is approximately 0.0001.
Increase n based on how prevalent your second most likely result is relative to your first and how certain you want to be in your result being the same across runs.
But, this supposes consistency is more important than cost.
I don’t believe this will work. At least in azure you either get a stable or a busted instance with a high n. The majority of cases, a high n (100) will either yield a consistent result, or a busted result. neither the busted nor the consistent result distribution yields any information about the global distribution