Logprobs keep changing when using the same prompt in chat.completion

Hi there,

I am doing zero-shot classification using GPT-4. The task is a binary classification task (safe/unsafe).

However, the log probability of each query keeps changing (see the outputs of the same prompt below). I’ve already fixed the same seed, temperature, etc.

Does anyone know why this happens and how to fix this?

Output of 1st query (logprob of the “unsafe” token=-8.418666e-06)

{'id': 'chatcmpl-8z7xtueIrmfQppqKtqMCDuif7LtkJ',
 'object': 'chat.completion',
 'created': 1709579317,
 'model': 'gpt-4-0125-preview',
 'choices': [{'index': 0,
   'message': {'role': 'assistant', 'content': 'unsafe'},
   'logprobs': {'content': [{'token': 'unsafe',
      'logprob': -8.418666e-06,
      'bytes': [117, 110, 115, 97, 102, 101],
      'top_logprobs': [{'token': 'unsafe',
        'logprob': -8.418666e-06,
        'bytes': [117, 110, 115, 97, 102, 101]}]}]},
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 322, 'completion_tokens': 1, 'total_tokens': 323},
 'system_fingerprint': 'fp_70b2088885'}

Output of 2nd query (logprob of the “unsafe” token=-0.000444374)

{'id': 'chatcmpl-8z7yFePnDrC1THreCvjrBV9dEsj0L',
 'object': 'chat.completion',
 'created': 1709579339,
 'model': 'gpt-4-0125-preview',
 'choices': [{'index': 0,
   'message': {'role': 'assistant', 'content': 'unsafe'},
   'logprobs': {'content': [{'token': 'unsafe',
      'logprob': -0.000444374,
      'bytes': [117, 110, 115, 97, 102, 101],
      'top_logprobs': [{'token': 'unsafe',
        'logprob': -0.000444374,
        'bytes': [117, 110, 115, 97, 102, 101]}]}]},
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 322, 'completion_tokens': 1, 'total_tokens': 323},
 'system_fingerprint': 'fp_32a098fbf7'}

Output of 3rd query (logprob of the “unsafe” token=-7.7318386e-05)

{'id': 'chatcmpl-8z7ynphs2FZdoPAx48HT9GUk8xhL9',
 'object': 'chat.completion',
 'created': 1709579373,
 'model': 'gpt-4-0125-preview',
 'choices': [{'index': 0,
   'message': {'role': 'assistant', 'content': 'unsafe'},
   'logprobs': {'content': [{'token': 'unsafe',
      'logprob': -7.7318386e-05,
      'bytes': [117, 110, 115, 97, 102, 101],
      'top_logprobs': [{'token': 'unsafe',
        'logprob': -7.7318386e-05,
        'bytes': [117, 110, 115, 97, 102, 101]}]}]},
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 322, 'completion_tokens': 1, 'total_tokens': 323},
 'system_fingerprint': 'fp_00ceb2df5b'}
1 Like

Known “feature” in all chat models. The last models that would produce the same output for an input were the retired GPT-3 series.

Your probability of those is [0.99955572, 0.99999158, 0.99992268], so we can be pretty sure the top token is adequately differentiated from the remaining 0.0005 probability mass of all other tokens :wink:

1 Like

Is there any way to make the logprobs deterministic?

Nope. You are seeing the results of calculations done inside the model just after softmax. OpenAI has not said if this is software or hardware, and likely won’t have anyone to both know why architecturally and answer about it.