Parallel API Calls for the Same User Query Result in Inconsistent Responses

If you want to always have “best” results of an input - responses that start similarly (but may diverge) - you would set top_p: 0.00001.

A seed parameter of fixed value, if used and re-supplied to the API model, will re-run with the same component of randomness in the sampler (randomness essentially turned off by the top_p above), but if the underlying computations are not identical (which they aren’t), then this doesn’t have as much meaning.

Thus, inconsistent responses are expected responses. You can resend and possibly get a better answer, or different brainstorming ideas.

You can read a bit more. OpenAI hasn’t come out directly and explained the technical reason for varying logits and dimensions on language models and embeddings since gpt-3.5+.

You can look at the ‘fingerprint’ returned in an API call to see if repeatability is furthermore not expected, perhaps due to the model running on a different server architecture (some models varying by up to five fingerprints in large trials).

1 Like