Surprising logprobs outputs for first token if it's '0'

Hi, I’m making API calls in the completion API to get gpt-3.5-turbo-0125 to classify some prompt with ‘0’ or ‘1’. If the first token in the response is ‘0’ then the linear probability always equals 1 + logprobs for that token. But if the first token is anything else, or if ‘0’ is not the first token, we don’t get the same behaviour. Does anyone know why that’s the case? It’s not a problem per se, it’s just when patterns show up in your statistics you wonder why.
image

A probability has a certainty of 0-1, representing a statistical chance 0%-100%.

Because there are 100k tokens in the BPE encoding dictionary all being evaluated on their certainty, the probability value of other tokens in the tail of alternates quickly becomes extremely small in their prediction probability. When you tell the AI to produce only a 0 or 1, the chance it produces the Chinese character for “book” is quite remote.

It becomes unwieldy to understand the small probabilities if they start with 20 fractional zeroes. Thus: logprobs, natural logarithm of the value, gives us the exponent for the formula: \text{prob} = e^{\text{logprob}}

Instead of dealing with a number like .00000000000004, the logprob ln(tiny) is -30.85.

This also gives temperature control a reasonable range when directly acting on logarithmic logits.

Now what happens in your depiction when the probability approaches 100%? The exponent approaches zero (-0), because of the mathematics anything^{\text{0}} = \text{1} . Euler’s number hardly matters.

So really, you are simply comparing a number nearly 1 to a number nearly 0, and seeing that the sum is also nearly 1. It doesn’t really mean anything.

Another cute math trick that exists for no reason is that for small angles in radians, sin(angle) ~= angle.
sin-1(0.08) = 0.08008558 which looks like boobs.


You might find it more instructive to get the top-10 logprobs of the token position instead of just the one chosen.