Seeking Assistance on Achieving Determinism in OpenAI Models

codebuggedcloud · August 5, 2024, 5:52pm

Hello community,

I’m currently working on a project that requires generating 100% reproducible outputs from OpenAI’s GPT-4 model for the same input prompt. Despite experimenting with various parameters like temperature, top_p, max_tokens, and setting a seed, I have not been able to achieve complete determinism.

Despite these efforts, there are still slight variations in the outputs. I understand that some level of stochasticity is inherent in these models, but I’m looking for any additional tips, tricks, or best practices that might help me achieve more consistent results.

Has anyone in the community successfully managed to get fully deterministic outputs from GPT-4? If so, could you please share your approach or any insights that might help?

anon22939549 · August 5, 2024, 6:57pm

This, unfortunately, isn’t really possible at this time.

The best you can do is set

temperature = 0
and a constant seed value.

This will give you as consistent of results as is possible.

Under the assumption that with a seed and zero-temperature the responses will be as deterministic as is possible, you can take multiple samples of each generation. If you set something like n = 13, then if you have two possible outcomes which occur with a frequency of 90% and 10%, the probability of getting 7-or-more of the less likely outcome is approximately 0.0001.

Increase n based on how prevalent your second most likely result is relative to your first and how certain you want to be in your result being the same across runs.

But, this supposes consistency is more important than cost.

Diet · August 6, 2024, 4:17am

Have you tested this?

I don’t believe this will work. At least in azure you either get a stable or a busted instance with a high n. The majority of cases, a high n (100) will either yield a consistent result, or a busted result. neither the busted nor the consistent result distribution yields any information about the global distribution

(7 tries, n=100: one busted, 6 stable, total N = 700). neither 7 nor 42 showed up in a 7 batch.

If you try it with a smaller n, you might get completely different results:

(56 tries, n=10: 51 normal, 50 off, total N = 560). there are no mixed batches here either.

query_model_gpt_4o("can you give me a random number? only print the number and nothing else.", 10)

Also, seed doesn’t do anything here because it just affects the sampler, but setting it probably won’t hurt anything.

Topic		Replies	Views
Deterministic Results Impossible for GPT-4o API gpt-4 , chat-completion , api-temperature , seed	6	1350	December 19, 2024
Ensuring Consistent Output with GPT-4o-2024-08-06 (Temperature Set to 0) Prompting gpt-4 , api	4	931	November 7, 2024
Achieving deterministic API output on language models - HOWTO API statistics	3	8863	December 21, 2023
Finetuned gpt4o is not deterministic API	4	263	December 20, 2024
Lack of determinisim even with temp 0 and fixed seed Bugs gpt-4	1	1150	May 24, 2024

Seeking Assistance on Achieving Determinism in OpenAI Models

Related topics