Logprobs inconsistent between runs for 4o

_j · September 10, 2024, 8:24pm

All AI models that OpenAI currently runs are non-deterministic. You give them the same input, they return different logprob values each time.

This variation is even higher in the newest models. You can specify either 0 or miniscule small values for top_p and temperature, and get answers that diverge pretty quickly.

Temperature 0 is not as good as temperature 0.0000000001 for some reason. top_p at an extremely low number is a stronger enforcer of only getting the top value back.

The answer overall though is perplexity. The less expensive AI is less clear and certain how to score tokens (unless it is a particular post-training chat behavior), and so the values of logprobs end up being closer together and easy for one to overtake another between runs.

Asking for logprobs doesn’t change the behavior. It does let you see how close “Yes” was to “No”, though, or to “yes” or “I’m sorry”. That can give you insight that you need to make your prompting and desired output clearer, or that the AI just has no good truthfulness score for you from the facts.

Topic		Replies	Views
Logprobs and message.content are inconsistent API gpt-4 , api , logprobs	6	2126	April 11, 2024
Why does the answer vary for the same question asked multiple times Community api	8	4139	May 22, 2024
Achieving deterministic API output on language models - HOWTO API statistics	2	10350	October 18, 2023
Why the API output is inconsistent even after the temperature is set to 0 API gpt-4	10	26642	August 26, 2023
Non-deterministic probabilities for first generated token in chat.completion? API	4	1151	April 24, 2024

Logprobs inconsistent between runs for 4o

Related topics