OpenAI is messing with the values returned from logprobs.
Take a look at these top logprobs showing as a result of stepping through more max_tokens of gpt-4o-mini:
Sentence: My least
Predicted token: My, logprobs: -2.9756122e-05, linear probability: 100.0%
Predicted token: favorite, logprobs: -10.87503, linear probability: 0.0%
Predicted token: "My, logprobs: -11.75003, linear probability: 0.0%
Predicted token: My, logprobs: -13.12503, linear probability: 0.0%
Predicted token: my, logprobs: -15.37503, linear probability: 0.0%
Sentence: My least favorite
Predicted token: My, logprobs: -0.00016623331, linear probability: 99.98%
Predicted token: "My, logprobs: -9.000166, linear probability: 0.01%
Predicted token: activity, logprobs: -11.000166, linear probability: 0.0%
Predicted token: my, logprobs: -11.250166, linear probability: 0.0%
Predicted token: My, logprobs: -12.250166, linear probability: 0.0%
Sentence: My least favorite TV
Predicted token: My, logprobs: -0.018826004, linear probability: 98.14%
Predicted token: show, logprobs: -4.018826, linear probability: 1.8%
Predicted token: "My, logprobs: -8.018826, linear probability: 0.03%
Predicted token: shows, logprobs: -8.768826, linear probability: 0.02%
Predicted token: Show, logprobs: -8.893826, linear probability: 0.01%
Look at all the results of a run: each ending with the same decimal digits for precision. Yet the top result has more digits of the same “precision”.
You see rounded impossibilities such as [-9.00, -11.00, -11.25, -12.25], followed by additional digits.
Although I haven’t seen it, a -9999 is just a probability approaching 0.
It is clear someone at OpenAI is laughing at any developer hoping to employ logprobs meaningfully now.