I’m wondering if there is a way to receive the top_p probabilities the same way it is possible to receive the logprobs.
My goal is to check a given word whether it appears in the top_p or not.
Thanks.
Interesting.
Logprobs being returned are often useful for finding out how an AI generated its answers.
However there is one aspect that OpenAI’s logprobs does not encompass: the nucleus sampling (top_p) parameter that was used in an API call.
This parameter does not factor into the length of the logprobs lists returned. The logprob API call allows one to set a maximum top_logprobs of 20. However, the actual number of logits that were used for the multinomial random sampling may be less than the 20 provided by the logprob - sending "top_p":0.5
may only consider the top 4 that occupy the probability distribution mass of the top 50% (inclusive).
(the actual inference may include probabilities of special tokens or other artifacts, so there is no way to see the “true inference” - this obfuscation is by design)
You would have to truncate the list yourself for that case - turning logprobs in to probabilities, and cumulatively adding them up - perhaps a new modification of the logprob response object or an addition:
a separate top_p_logprob
, which includes only those where the cumulative probability is under 50%, or the final logprob that starts at under 50% when you’d send a top_p:0.5.
Gonna go gonzo and give you complete non-SDK Python demo code. (the rest should be cookbook-quality too, but the forum doesn’t display the breadth)
This script demonstrates how to make and augment an OpenAI chat completions response object with an additional “top_p_logprobs” hierarchy, alongside the existing “top_logprobs” of full length. The global variable TOP_P is used both as an API parameter (for nucleus sampling) and to filter each token’s top_logprobs list.
- Only those candidate tokens whose cumulative probability is below TOP_P are kept, except that the candidate that causes the cumulative probability to exceed TOP_P is also included.
If the available top_logprobs from the API (which may be limited by the API’s maximum, by default up to 20 tokens) do not sum to TOP_P, the function prints a warning indicating that the top_p_logprobs list is limited. Setting TOP_P == 1.0 (or omitting it) implies no limit, and no warnings will be shown.
import math
import os
import json
import httpx
# Global configuration variables:
TOP_P = 0.5 # Global nucleus sampling threshold (e.g. 0.5 means consider tokens until 50% probability mass)
MAX_TOP_LOGPROBS = 20 # Maximum number of top_logprobs returned by the API (can be up to 20)
# Model and sampling settings.
model = "gpt-4o-mini"
logprobs_enabled = True
max_completion_tokens = 3
###############################################################################
# Helper function to get the API request headers.
###############################################################################
def _get_headers() -> dict[str, str]:
"""
Generates OpenAI authentication headers from environment variables for RESTful HTTP requests.
It will use the optional OPENAI_ORG_ID and/or OPENAI_PROJECT_ID if you have set them.
This function raises a ValueError if the OPENAI_API_KEY environment variable is not set.
Returns:
dict: Headers containing authorization and content type information.
"""
if not os.getenv("OPENAI_API_KEY"):
raise ValueError("Please set the OPENAI_API_KEY environment variable.")
headers = {
"Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}",
"OpenAI-Beta": "assistants=v2", # required by Assistants; ignored by other endpoints
"Content-Type": "application/json",
}
if os.getenv("OPENAI_ORG_ID"):
headers["OpenAI-Organization"] = os.getenv("OPENAI_ORG_ID")
if os.getenv("OPENAI_PROJECT_ID"):
headers["OpenAI-Project"] = os.getenv("OPENAI_PROJECT_ID")
return headers
###############################################################################
# Function to augment the response object with a top_p_logprobs field.
###############################################################################
def augment_logprobs(response: dict, global_top_p: float = TOP_P) -> None:
"""
Processes the logprobs information in an OpenAI chat completions response to create an additional
"top_p_logprobs" list for each token. This new list is a subset of the existing "top_logprobs"
that includes only those candidate tokens where the cumulative probability is below the
nucleus sampling threshold (global_top_p), plus the candidate that causes the sum to meet or exceed it.
If global_top_p is 1.0 or greater, the entire list of top_logprobs is copied.
Additionally, if the available candidates (which may be limited by the API to MAX_TOP_LOGPROBS)
do not sum up to global_top_p, a warning is printed:
"warning: top_p_logprobs limited to 20 at token NN"
where NN is the index of the token within the logprobs.content list.
Args:
response (dict): The chat completions response object from the API.
global_top_p (float, optional): The desired nucleus sampling threshold. Defaults to TOP_P.
Returns:
None: The function modifies the response object in place.
"""
# Iterate over each choice in the response.
for choice in response.get("choices", []):
logprobs_obj = choice.get("logprobs")
if not logprobs_obj:
continue
content_list = logprobs_obj.get("content", [])
for idx, token_obj in enumerate(content_list):
if "top_logprobs" not in token_obj:
continue # Skip tokens that do not have top_logprobs information.
original_candidates = token_obj["top_logprobs"]
# If no nucleus sampling limitation, copy the full candidate list.
if global_top_p >= 1.0:
token_obj["top_p_logprobs"] = original_candidates.copy()
else:
cumulative_prob = 0.0
selected_candidates = []
for candidate in original_candidates:
# Compute the probability from the log probability.
candidate_prob = math.exp(candidate.get("logprob", float("-inf")))
# Include candidate if cumulative is still below threshold.
if cumulative_prob < global_top_p:
selected_candidates.append(candidate)
cumulative_prob += candidate_prob
# Even if this candidate causes the cumulative sum to meet or exceed the threshold,
# we include it and then stop.
if cumulative_prob >= global_top_p:
break
else:
break
token_obj["top_p_logprobs"] = selected_candidates
# If the sum of probabilities is less than global_top_p and the available
# candidates reached the maximum limit, issue a warning.
if (cumulative_prob < global_top_p) and (len(original_candidates) >= MAX_TOP_LOGPROBS):
print(f"warning: top_p_logprobs limited to {MAX_TOP_LOGPROBS} at token {idx}")
###############################################################################
# Demonstration: Sending a chat completion API call using httpx and transforming the response.
###############################################################################
# Step 1: Setup prompt and API parameters.
user_prompt = "Write the first line of a 'kitten haiku' as your only response"
# Step 2: Define messages (system and user) for the chat.
system_message = [
{
"type": "text",
"text": "You are a helpful AI assistant",
}
]
user_message = [
{
"type": "text",
"text": user_prompt,
}
]
messages = [
{"role": "developer", "content": system_message},
{"role": "user", "content": user_message},
]
# Step 3: Construct the API request body.
request_body = {
"model": model,
"messages": messages,
"response_format": {"type": "text"},
"stop": [],
**({"max_completion_tokens": max_completion_tokens} if "max_completion_tokens" in locals() else {}),
# Sampling and penalty parameters:
"temperature": 1.0,
"top_p": TOP_P, # Use the global TOP_P variable.
"logprobs": logprobs_enabled,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
}
if request_body.get("tools"):
request_body["tool_choice"] = "auto"
# When logprobs are enabled, specify the maximum number of top logit candidates to return.
if logprobs_enabled:
request_body["top_logprobs"] = MAX_TOP_LOGPROBS # This value can be set up to MAX_TOP_LOGPROBS (20).
# Step 4: Send the API request and process the response.
openai_chat_endpoint = "https://api.openai.com/v1/chat/completions"
try:
# Send HTTP POST request to the OpenAI API.
response = httpx.post(
openai_chat_endpoint,
headers=_get_headers(),
json=request_body,
timeout=500.0,
)
response.raise_for_status() # Raise an exception for HTTP errors.
response_json = response.json()
# Print out the model name and first choice response for demonstration.
print("Model:")
print(json.dumps(response_json.get("model", "Unknown model"), indent=2))
#print("\nOriginal First Choice:")
#print(json.dumps(response_json["choices"][0], indent=2))
# Step 5: Augment the response object by adding "top_p_logprobs" for each token.
augment_logprobs(response_json, TOP_P)
# For demonstration, print out the modified logprobs object from the first choice.
print("\nTransformed logprobs with top_p_logprobs:")
print(json.dumps(response_json["choices"][0].get("logprobs", {}), indent=2))
except httpx.HTTPError as http_err:
print(f"HTTP error occurred: {http_err}")
except Exception as e:
print(f"An error occurred: {e}")
Here is the long formatted output of the new response object - just max_completion_tokens and its 700 lines, so this could bomb your console app if you actually printed a full request.
Model:
"gpt-4o-mini-2024-07-18"
Transformed logprobs with top_p_logprobs:
{
"content": [
{
"token": "Soft",
"logprob": -0.89273024,
"bytes": [
83,
111,
102,
116
],
"top_logprobs": [
{
"token": "Soft",
"logprob": -0.89273024,
"bytes": [
83,
111,
102,
116
]
},
{
"token": "Wh",
"logprob": -0.89273024,
"bytes": [
87,
104
]
},
{
"token": "Tiny",
"logprob": -2.3927302,
"bytes": [
84,
105,
110,
121
]
},
{
"token": "P",
"logprob": -2.6427302,
"bytes": [
80
]
},
{
"token": "Gent",
"logprob": -5.01773,
"bytes": [
71,
101,
110,
116
]
},
{
"token": "Fl",
"logprob": -5.51773,
"bytes": [
70,
108
]
},
{
"token": "Play",
"logprob": -5.76773,
"bytes": [
80,
108,
97,
121
]
},
{
"token": "Cur",
"logprob": -6.64273,
"bytes": [
67,
117,
114
]
},
{
"token": "Small",
"logprob": -6.89273,
"bytes": [
83,
109,
97,
108,
108
]
},
{
"token": "Little",
"logprob": -7.26773,
"bytes": [
76,
105,
116,
116,
108,
101
]
},
{
"token": "F",
"logprob": -7.64273,
"bytes": [
70
]
},
{
"token": "Silent",
"logprob": -8.517731,
"bytes": [
83,
105,
108,
101,
110,
116
]
},
{
"token": "In",
"logprob": -8.642731,
"bytes": [
73,
110
]
},
{
"token": "A",
"logprob": -8.892731,
"bytes": [
65
]
},
{
"token": "Vel",
"logprob": -9.392731,
"bytes": [
86,
101,
108
]
},
{
"token": "B",
"logprob": -9.892731,
"bytes": [
66
]
},
{
"token": "C",
"logprob": -10.017731,
"bytes": [
67
]
},
{
"token": "Sun",
"logprob": -10.017731,
"bytes": [
83,
117,
110
]
},
{
"token": "T",
"logprob": -10.267731,
"bytes": [
84
]
},
{
"token": "Eyes",
"logprob": -10.267731,
"bytes": [
69,
121,
101,
115
]
}
],
"top_p_logprobs": [
{
"token": "Soft",
"logprob": -0.89273024,
"bytes": [
83,
111,
102,
116
]
},
{
"token": "Wh",
"logprob": -0.89273024,
"bytes": [
87,
104
]
}
]
},
{
"token": " paws",
"logprob": -0.03889219,
"bytes": [
32,
112,
97,
119,
115
],
"top_logprobs": [
{
"token": " paws",
"logprob": -0.03889219,
"bytes": [
32,
112,
97,
119,
115
]
},
{
"token": " p",
"logprob": -3.6638923,
"bytes": [
32,
112
]
},
{
"token": " paw",
"logprob": -4.9138923,
"bytes": [
32,
112,
97,
119
]
},
{
"token": " whisk",
"logprob": -5.5388923,
"bytes": [
32,
119,
104,
105,
115,
107
]
},
{
"token": " fur",
"logprob": -7.4138923,
"bytes": [
32,
102,
117,
114
]
},
{
"token": " whispers",
"logprob": -7.5388923,
"bytes": [
32,
119,
104,
105,
115,
112,
101,
114,
115
]
},
{
"token": "ly",
"logprob": -10.663892,
"bytes": [
108,
121
]
},
{
"token": " whisper",
"logprob": -10.788892,
"bytes": [
32,
119,
104,
105,
115,
112,
101,
114
]
},
{
"token": "est",
"logprob": -11.288892,
"bytes": [
101,
115,
116
]
},
{
"token": " little",
"logprob": -11.663892,
"bytes": [
32,
108,
105,
116,
116,
108,
101
]
},
{
"token": " pads",
"logprob": -11.913892,
"bytes": [
32,
112,
97,
100,
115
]
},
{
"token": " pat",
"logprob": -12.288892,
"bytes": [
32,
112,
97,
116
]
},
{
"token": ",",
"logprob": -12.913892,
"bytes": [
44
]
},
{
"token": " wh",
"logprob": -13.038892,
"bytes": [
32,
119,
104
]
},
{
"token": " tiny",
"logprob": -13.163892,
"bytes": [
32,
116,
105,
110,
121
]
},
{
"token": " kitten",
"logprob": -13.163892,
"bytes": [
32,
107,
105,
116,
116,
101,
110
]
},
{
"token": " eyes",
"logprob": -13.288892,
"bytes": [
32,
101,
121,
101,
115
]
},
{
"token": " velvet",
"logprob": -13.288892,
"bytes": [
32,
118,
101,
108,
118,
101,
116
]
},
{
"token": " P",
"logprob": -13.413892,
"bytes": [
32,
80
]
},
{
"token": " steps",
"logprob": -13.413892,
"bytes": [
32,
115,
116,
101,
112,
115
]
}
],
"top_p_logprobs": [
{
"token": " paws",
"logprob": -0.03889219,
"bytes": [
32,
112,
97,
119,
115
]
}
]
},
{
"token": " on",
"logprob": -1.7707677,
"bytes": [
32,
111,
110
],
"top_logprobs": [
{
"token": " tread",
"logprob": -1.1457677,
"bytes": [
32,
116,
114,
101,
97,
100
]
},
{
"token": " on",
"logprob": -1.7707677,
"bytes": [
32,
111,
110
]
},
{
"token": " dance",
"logprob": -2.2707677,
"bytes": [
32,
100,
97,
110,
99,
101
]
},
{
"token": " in",
"logprob": -2.3957677,
"bytes": [
32,
105,
110
]
},
{
"token": " p",
"logprob": -2.6457677,
"bytes": [
32,
112
]
},
{
"token": " tip",
"logprob": -2.6457677,
"bytes": [
32,
116,
105,
112
]
},
{
"token": " pad",
"logprob": -2.7707677,
"bytes": [
32,
112,
97,
100
]
},
{
"token": " pat",
"logprob": -3.6457677,
"bytes": [
32,
112,
97,
116
]
},
{
"token": " gently",
"logprob": -4.0207677,
"bytes": [
32,
103,
101,
110,
116,
108,
121
]
},
{
"token": " whisper",
"logprob": -4.5207677,
"bytes": [
32,
119,
104,
105,
115,
112,
101,
114
]
},
{
"token": " touch",
"logprob": -4.7707677,
"bytes": [
32,
116,
111,
117,
99,
104
]
},
{
"token": " softly",
"logprob": -4.8957677,
"bytes": [
32,
115,
111,
102,
116,
108,
121
]
},
{
"token": " tap",
"logprob": -5.1457677,
"bytes": [
32,
116,
97,
112
]
},
{
"token": " kne",
"logprob": -5.1457677,
"bytes": [
32,
107,
110,
101
]
},
{
"token": " creep",
"logprob": -5.6457677,
"bytes": [
32,
99,
114,
101,
101,
112
]
},
{
"token": " grace",
"logprob": -6.0207677,
"bytes": [
32,
103,
114,
97,
99,
101
]
},
{
"token": " stretch",
"logprob": -6.1457677,
"bytes": [
32,
115,
116,
114,
101,
116,
99,
104
]
},
{
"token": " pr",
"logprob": -6.3957677,
"bytes": [
32,
112,
114
]
},
{
"token": " press",
"logprob": -6.3957677,
"bytes": [
32,
112,
114,
101,
115,
115
]
},
{
"token": " chase",
"logprob": -6.3957677,
"bytes": [
32,
99,
104,
97,
115,
101
]
}
],
"top_p_logprobs": [
{
"token": " tread",
"logprob": -1.1457677,
"bytes": [
32,
116,
114,
101,
97,
100
]
},
{
"token": " on",
"logprob": -1.7707677,
"bytes": [
32,
111,
110
]
},
{
"token": " dance",
"logprob": -2.2707677,
"bytes": [
32,
100,
97,
110,
99,
101
]
}
]
}
],
"refusal": null
}
You will see that we’ve achieved the goal: while each top_logprobs has 20, the top_p_logprobs that were filtered give us 2, 1, 3 results for our max_completion_tokens = 3 “kitten haiku”.
The function should be reusable with just the one demonstration line utilizing it, performed on a response dictionary as Python object (a Python openai SDK library result will need to be transformed with response.model_dump()
)
You can also re-normalize each result list - so that it again demonstrates the 1.0 probability space that might have been employed.