Logprobs keep changing when using the same prompt in chat.completion

JC1107 · March 4, 2024, 7:23pm

Hi there,

I am doing zero-shot classification using GPT-4. The task is a binary classification task (safe/unsafe).

However, the log probability of each query keeps changing (see the outputs of the same prompt below). I’ve already fixed the same seed, temperature, etc.

Does anyone know why this happens and how to fix this?

Output of 1st query (logprob of the “unsafe” token=-8.418666e-06)

{'id': 'chatcmpl-8z7xtueIrmfQppqKtqMCDuif7LtkJ',
 'object': 'chat.completion',
 'created': 1709579317,
 'model': 'gpt-4-0125-preview',
 'choices': [{'index': 0,
   'message': {'role': 'assistant', 'content': 'unsafe'},
   'logprobs': {'content': [{'token': 'unsafe',
      'logprob': -8.418666e-06,
      'bytes': [117, 110, 115, 97, 102, 101],
      'top_logprobs': [{'token': 'unsafe',
        'logprob': -8.418666e-06,
        'bytes': [117, 110, 115, 97, 102, 101]}]}]},
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 322, 'completion_tokens': 1, 'total_tokens': 323},
 'system_fingerprint': 'fp_70b2088885'}

Output of 2nd query (logprob of the “unsafe” token=-0.000444374)

{'id': 'chatcmpl-8z7yFePnDrC1THreCvjrBV9dEsj0L',
 'object': 'chat.completion',
 'created': 1709579339,
 'model': 'gpt-4-0125-preview',
 'choices': [{'index': 0,
   'message': {'role': 'assistant', 'content': 'unsafe'},
   'logprobs': {'content': [{'token': 'unsafe',
      'logprob': -0.000444374,
      'bytes': [117, 110, 115, 97, 102, 101],
      'top_logprobs': [{'token': 'unsafe',
        'logprob': -0.000444374,
        'bytes': [117, 110, 115, 97, 102, 101]}]}]},
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 322, 'completion_tokens': 1, 'total_tokens': 323},
 'system_fingerprint': 'fp_32a098fbf7'}

Output of 3rd query (logprob of the “unsafe” token=-7.7318386e-05)

{'id': 'chatcmpl-8z7ynphs2FZdoPAx48HT9GUk8xhL9',
 'object': 'chat.completion',
 'created': 1709579373,
 'model': 'gpt-4-0125-preview',
 'choices': [{'index': 0,
   'message': {'role': 'assistant', 'content': 'unsafe'},
   'logprobs': {'content': [{'token': 'unsafe',
      'logprob': -7.7318386e-05,
      'bytes': [117, 110, 115, 97, 102, 101],
      'top_logprobs': [{'token': 'unsafe',
        'logprob': -7.7318386e-05,
        'bytes': [117, 110, 115, 97, 102, 101]}]}]},
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 322, 'completion_tokens': 1, 'total_tokens': 323},
 'system_fingerprint': 'fp_00ceb2df5b'}

_j · March 4, 2024, 10:02pm

Known “feature” in all chat models. The last models that would produce the same output for an input were the retired GPT-3 series.

Your probability of those is [0.99955572, 0.99999158, 0.99992268], so we can be pretty sure the top token is adequately differentiated from the remaining 0.0005 probability mass of all other tokens

JC1107 · March 5, 2024, 1:49am

Is there any way to make the logprobs deterministic?

_j · March 5, 2024, 2:46am

Nope. You are seeing the results of calculations done inside the model just after softmax. OpenAI has not said if this is software or hardware, and likely won’t have anyone to both know why architecturally and answer about it.

Topic		Replies	Views
Non-deterministic probabilities for first generated token in chat.completion? API	4	784	April 24, 2024
Achieving deterministic API output on language models - HOWTO API statistics	3	7635	December 21, 2023
Possible bug? Nondeterministic logprobs with echo=True, max_tokens=0 API	3	1232	December 21, 2023
Deterministic Results Impossible for GPT-4o API gpt-4 , chat-completion , api-temperature , seed	6	542	December 19, 2024
Why does the answer vary for the same question asked multiple times Community api	8	1489	May 22, 2024

Logprobs keep changing when using the same prompt in chat.completion

Related topics