Ensuring Consistent Output with GPT-4o-2024-08-06 (Temperature Set to 0)

xiao.lu · November 6, 2024, 8:58pm

I’m noticing that when I run the same prompt multiple times with GPT-4o-2024-08-06, the outputs aren’t always identical, even though I’ve set the temperature to 0. Is there anything more I can do to make the output fully consistent from run to run?

Any tips or insights would be appreciated!

Diet · November 7, 2024, 1:19am

Hi!

You can also set top_p to zero, but it likely wouldn’t make that much of a difference.

The outputs aren’t guaranteed to be deterministic. The best way to get “deterministic” outputs (on top of disabling the sampler with temp and top_p = 0) is to adjust your prompt such that a certain output is all but guaranteed.

e.g.:

What do you call a group of fish?

A group of fish is typically called a school or a shoal.

School refers to a group of fish swimming in a coordinated, synchronized way, usually of the same species, moving together for protection or to appear larger to predators.

Shoal is a more general term that applies to a group of fish swimming together loosely, without necessarily moving in the same direction or coordinating closely.

vs.:

What do you call a group of fish? Respond only with one word, no quotes, no period.

School

I know it’s not always possible, but if you can structure your prompts you can achieve a decent level of determinism even across models and vendors of similar capability.

xiao.lu · November 7, 2024, 1:24am

This really helps! Thank you, @Diet!

_j · November 7, 2024, 1:58am

Getting consistent results between every run isn’t really what AI is about, but it shows you the divergence from the algorithmic best rank that is going on underneath in any case.

The other thing you can do besides setting top_p to zero or extremely small, and sending a fixed seed value (which after top_p giving always the top, should then have no effect) is to look at the system_fingerprint being returned, which is also supposed to point out model differences or architecture differences that affect determinism.

Running 30 trials just now of 2024-08-06, and mixing in prediction vs without, there’s a bunch, that would make it costly to discard all but one for analysis:

Stat	Count	First	Minimum	Maximum
fingerprint	unique:006	159D8341CC	159D8341CC	A7D06E42A7

xiao.lu · November 7, 2024, 2:57am

Thank you so much, @_j , for sharing your insights and the experiment results!

Topic		Replies	Views
Run same query many times - different results API	11	7007	December 21, 2023
Achieving deterministic API output on language models - HOWTO API statistics	3	6753	December 21, 2023
Seeking Assistance on Achieving Determinism in OpenAI Models Community gpt-4 , api	2	249	August 6, 2024
I get different answers to the same request API gpt-4 , gpt-35-turbo , chatgpt , api	2	4124	December 8, 2023
Why is GPT-4 giving different answers with same prompt & temperature=0? API	6	14999	April 6, 2023

Ensuring Consistent Output with GPT-4o-2024-08-06 (Temperature Set to 0)

Related topics