# Seeking Assistance on Achieving Determinism in OpenAI Models

Hello community,

I’m currently working on a project that requires generating 100% reproducible outputs from OpenAI’s GPT-4 model for the same input prompt. Despite experimenting with various parameters like `temperature`, `top_p`, `max_tokens`, and setting a `seed`, I have not been able to achieve complete determinism.

Despite these efforts, there are still slight variations in the outputs. I understand that some level of stochasticity is inherent in these models, but I’m looking for any additional tips, tricks, or best practices that might help me achieve more consistent results.

Has anyone in the community successfully managed to get fully deterministic outputs from GPT-4? If so, could you please share your approach or any insights that might help?

2 Likes

This, unfortunately, isn’t really possible at this time.

The best you can do is set

• `temperature = 0`
• and a constant `seed` value.

This will give you as consistent of results as is possible.

Under the assumption that with a seed and zero-temperature the responses will be as deterministic as is possible, you can take multiple samples of each generation. If you set something like `n = 13`, then if you have two possible outcomes which occur with a frequency of 90% and 10%, the probability of getting 7-or-more of the less likely outcome is approximately 0.0001.

Increase `n` based on how prevalent your second most likely result is relative to your first and how certain you want to be in your result being the same across runs.

But, this supposes consistency is more important than cost.

3 Likes

Have you tested this?

I don’t believe this will work. At least in azure you either get a stable or a busted instance with a high n. The majority of cases, a high n (100) will either yield a consistent result, or a busted result. neither the busted nor the consistent result distribution yields any information about the global distribution

(7 tries, n=100: one busted, 6 stable, total N = 700). neither 7 nor 42 showed up in a 7 batch.

If you try it with a smaller n, you might get completely different results:

(56 tries, n=10: 51 normal, 50 off, total N = 560). there are no mixed batches here either.

``````query_model_gpt_4o("can you give me a random number? only print the number and nothing else.", 10)
``````

Also, seed doesn’t do anything here because it just affects the sampler, but setting it probably won’t hurt anything.