Let’s see if the values are broken on gpt-4.1 as such results would indicate.
A boolean answer is solicited, and the logprob extracted from the position of a JSON output string value that is instructed:
Messages
SYSTEM
You are a binary classifier, answering every question with only Yes or No.
You are an expert at finding this best truthful boolean answer to any input question.
Regardless of the type of input or how inapplicable, you still must determine the best choice.
# Responses
You produce a JSON with key answer; the value of answer must be chosen from only enums:
['yes', 'no']
# Permitted JSON responses
## select one only from:
{"answer":"yes"}
{"answer":"no"}
USER
yes or no: Is a cashew apple actually a berry?
gpt-4o
Response
RESPONSE content: {“answer”:“no”}
RESPONSE token number(s): [1750]
Logprobs:
Token: “no”
Probability: 67.917288732095443%
Top Logprobs:
Token: “no”
Probability: 67.917288732095443%
Token: “yes”
Probability: 32.081855549896488%
Token: “Yes”
Probability: 0.000324992083312%
Token: " no"
Probability: 0.000119557905994%
running gpt-4.1
Response
RESPONSE content: {“answer”:“no”}
RESPONSE token number(s): [1750]
Logprobs:
Token: “no”
Probability: 90.465043176398169%
Top Logprobs:
Token: “no”
Probability: 90.465043176398169%
Token: “yes”
Probability: 9.534945969075354%
Token: “No”
Probability: 0.000000099806134%
Token: “Yes”
Probability: 0.000000032402308%
Answer:
Logprobs have the expected distribution.
The task is so highly-instructed that the logprob of each token leading up to the output position is 0.0 - 100%
logprobs[0]
{'token': '{"', 'bytes': [123, 34], 'logprob': 0.0, 'top_logprobs': [{'token': '{"', 'bytes': [123, 34], 'logprob': 0.0, 'prob': 1.0}, {'token': "{'", 'bytes': [123, 39], 'logprob': -19.875, 'prob': 2.335593038800113e-09}, {'token': '```', 'bytes': [96, 96, 96], 'logprob': -21.5, 'prob': 4.59905537865397e-10}, {'token': '{\\"', 'bytes': [123, 92, 34], 'logprob': -23.375, 'prob': 7.052879851114916e-11}], 'prob': 1.0}
logprobs[1]
{'token': 'answer', 'bytes': [97, 110, 115, 119, 101, 114], 'logprob': 0.0, 'top_logprobs': [{'token': 'answer', 'bytes': [97, 110, 115, 119, 101, 114], 'logprob': 0.0, 'prob': 1.0}, {'token': 'ANSWER', 'bytes': [65, 78, 83, 87, 69, 82], 'logprob': -19.734375, 'prob': 2.688251109328749e-09}, {'token': ' answer', 'bytes': [32, 97, 110, 115, 119, 101, 114], 'logprob': -22.37109375, 'prob': 1.9246751109065142e-10}, {'token': '\tanswer', 'bytes': [9, 97, 110, 115, 119, 101, 114], 'logprob': -22.841796875, 'prob': 1.2020807997466449e-10}], 'prob': 1.0}
logprobs[2]
{'token': '":"', 'bytes': [34, 58, 34], 'logprob': 0.0, 'top_logprobs': [{'token': '":"', 'bytes': [34, 58, 34], 'logprob': 0.0, 'prob': 1.0}, {'token': '":', 'bytes': [34, 58], 'logprob': -22.375, 'prob': 1.917171513759029e-10}, {'token': "':'", 'bytes': [39, 58, 39], 'logprob': -26.0, 'prob': 5.109089028065546e-12}, {'token': '\\":\\"', 'bytes': [92, 34, 58, 92, 34], 'logprob': -30.4375, 'prob': 6.04173548070253e-14}], 'prob': 1.0}
tip: - be sure you are extracting from the correct token position. Numbers are unjoinable, do not have a leading space of their own, and in JSON, space must be added. You might be missing that the “99.9% token” is the space that comes before the value.
The “prob” field for human interpretation is added, and calculated without external function (from math import exp - which might be damaged elsewhere).
# create a logprobs object that also has probabilities
e = 2.718281828459
if response_dict["choices"][0]["logprobs"]:
logprobs = response_dict["choices"][0]["logprobs"]["content"]
for entry in logprobs:
entry["prob"] = (
e ** entry["logprob"]
) # "logprob" to probability
for top in entry.get("top_logprobs", []):
top["prob"] = (
e ** top["logprob"]
) # "logprob" in "top_logprobs"
lp = logprobs[3] # the specific logprob entry for the actual answer token
Just to note, in my own example here, I just rewrote the system message (where the enums being injected are automated) to have whitespace in the JSON, the key names enclosed in backticks, that output is sent to an API…thus advancing the answer position one token forward and with a result of making the AI more sure about ambiguous fruits.
>>> for logprob in logprobs:
... print(logprob['prob'])
1.0
1.0
0.9999920581810099
1.0
0.9975274032511579
1.0