Parallel API Calls for the Same User Query Result in Inconsistent Responses

_j · October 17, 2024, 4:47am

If you want to always have “best” results of an input - responses that start similarly (but may diverge) - you would set top_p: 0.00001.

A seed parameter of fixed value, if used and re-supplied to the API model, will re-run with the same component of randomness in the sampler (randomness essentially turned off by the top_p above), but if the underlying computations are not identical (which they aren’t), then this doesn’t have as much meaning.

Thus, inconsistent responses are expected responses. You can resend and possibly get a better answer, or different brainstorming ideas.

You can read a bit more. OpenAI hasn’t come out directly and explained the technical reason for varying logits and dimensions on language models and embeddings since gpt-3.5+.

You can look at the ‘fingerprint’ returned in an API call to see if repeatability is furthermore not expected, perhaps due to the model running on a different server architecture (some models varying by up to five fingerprints in large trials).

Topic		Replies	Views
Deterministic Results Impossible for GPT-4o API gpt-4 , chat-completion , api-temperature , seed	6	541	December 19, 2024
I get different answers to the same request API gpt-4 , gpt-35-turbo , chatgpt , api	2	4770	December 8, 2023
Run same query many times - different results API	11	7591	December 21, 2023
Why does the answer vary for the same question asked multiple times Community api	8	1488	May 22, 2024
Logprobs keep changing when using the same prompt in chat.completion API api	3	1341	March 5, 2024

Parallel API Calls for the Same User Query Result in Inconsistent Responses

Related topics