Understanding of logprob and top_logprob

I wanted to understand the concept better, hence posting it in community. For the below my understanding was that ‘sym’ will be picked up as the token which has the highest logprob, but it picked up ‘covid’ as the token. How do we explain this?

ChatCompletionTokenLogprob(token=‘COVID’, bytes=[67, 79, 86, 73, 68], logprob=-2.1399019, top_logprobs=[TopLogprob(token=‘Sym’, bytes=[83, 121, 109], logprob=-0.7570321), TopLogprob(token=‘The’, bytes=[84, 104, 101], logprob=-1.3814332), TopLogprob(token=‘COVID’, bytes=[67, 79, 86, 73, 68], logprob=-2.1399019), TopLogprob(token=‘According’, bytes=[65, 99, 99, 111, 114, 100, 105, 110, 103], logprob=-2.4951494), TopLogprob(token=‘People’, bytes=[80, 101, 111, 112, 108, 101], logprob=-3.0639322)])

When the model selected the token COVID to continue its message it has several options to choose from each with a certain log-probability of being chosen,

Token Log Prob Probability
Sym -0.757 0.469
The -1.381 0.251
COVID -2.140 0.118
According -2.495 0.082
People -3.064 0.047

So, the model randomly selected a token weighed by these probabilities and landed on COVID.

Sampling is used to randomly select from the set of tokens, biased by the certainty.

Selecting only the top choice is called “greedy sampling” and actually doesn’t produce language that seems as human, despite being more reliable.

How tokens are sampled can be affected by API parameters.

or

To make it deterministic, will I be able to write an additional logic to pick up the token with the highest logprob in top log_prob. Will the top log_prob always have a deterministic log_probs

just set top_p to 0 like what _j linked.

top_p allows the top probabilities up to x. 0.5 would allow “Sym” and “The”. 0.4 would only allow “Sym”. 0 would also only allow “Sym”. There might be something going on in the background that rewrites top_p to something non-zero, so you can additionally set the temperature to 0 to make it even more stable. (but in practice probably doesn’t make a difference)

within a model fingerprint, the the log probs should remain approximately the same. The models aren’t fully deterministic due to reasons, but generally should be more or less.

The canonical way to accomplish this is by setting temperature = 0, this tells the model to perform a greedy sampling and choose only the top choice.