Is there a way to receive the top_p probabilities?

a0548448256 · February 6, 2025, 9:27pm

I’m wondering if there is a way to receive the top_p probabilities the same way it is possible to receive the logprobs.
My goal is to check a given word whether it appears in the top_p or not.
Thanks.

_j · February 6, 2025, 11:07pm

Interesting.

Logprobs being returned are often useful for finding out how an AI generated its answers.

However there is one aspect that OpenAI’s logprobs does not encompass: the nucleus sampling (top_p) parameter that was used in an API call.

This parameter does not factor into the length of the logprobs lists returned. The logprob API call allows one to set a maximum top_logprobs of 20. However, the actual number of logits that were used for the multinomial random sampling may be less than the 20 provided by the logprob - sending "top_p":0.5 may only consider the top 4 that occupy the probability distribution mass of the top 50% (inclusive).

(the actual inference may include probabilities of special tokens or other artifacts, so there is no way to see the “true inference” - this obfuscation is by design)

You would have to truncate the list yourself for that case - turning logprobs in to probabilities, and cumulatively adding them up - perhaps a new modification of the logprob response object or an addition:

a separate top_p_logprob, which includes only those where the cumulative probability is under 50%, or the final logprob that starts at under 50% when you’d send a top_p:0.5.

Gonna go gonzo and give you complete non-SDK Python demo code. (the rest should be cookbook-quality too, but the forum doesn’t display the breadth)

This script demonstrates how to make and augment an OpenAI chat completions response object with an additional “top_p_logprobs” hierarchy, alongside the existing “top_logprobs” of full length. The global variable TOP_P is used both as an API parameter (for nucleus sampling) and to filter each token’s top_logprobs list.

Only those candidate tokens whose cumulative probability is below TOP_P are kept, except that the candidate that causes the cumulative probability to exceed TOP_P is also included.

If the available top_logprobs from the API (which may be limited by the API’s maximum, by default up to 20 tokens) do not sum to TOP_P, the function prints a warning indicating that the top_p_logprobs list is limited. Setting TOP_P == 1.0 (or omitting it) implies no limit, and no warnings will be shown.

import math
import os
import json
import httpx

# Global configuration variables:
TOP_P = 0.5             # Global nucleus sampling threshold (e.g. 0.5 means consider tokens until 50% probability mass)
MAX_TOP_LOGPROBS = 20   # Maximum number of top_logprobs returned by the API (can be up to 20)

# Model and sampling settings.
model = "gpt-4o-mini"
logprobs_enabled = True
max_completion_tokens = 3

###############################################################################
# Helper function to get the API request headers.
###############################################################################
def _get_headers() -> dict[str, str]:
    """
    Generates OpenAI authentication headers from environment variables for RESTful HTTP requests.
    
    It will use the optional OPENAI_ORG_ID and/or OPENAI_PROJECT_ID if you have set them.
    This function raises a ValueError if the OPENAI_API_KEY environment variable is not set.
    
    Returns:
        dict: Headers containing authorization and content type information.
    """
    if not os.getenv("OPENAI_API_KEY"):
        raise ValueError("Please set the OPENAI_API_KEY environment variable.")
    
    headers = {
        "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}",
        "OpenAI-Beta": "assistants=v2",  # required by Assistants; ignored by other endpoints
        "Content-Type": "application/json",
    }
    if os.getenv("OPENAI_ORG_ID"):
        headers["OpenAI-Organization"] = os.getenv("OPENAI_ORG_ID")
    if os.getenv("OPENAI_PROJECT_ID"):
        headers["OpenAI-Project"] = os.getenv("OPENAI_PROJECT_ID")
    return headers

###############################################################################
# Function to augment the response object with a top_p_logprobs field.
###############################################################################
def augment_logprobs(response: dict, global_top_p: float = TOP_P) -> None:
    """
    Processes the logprobs information in an OpenAI chat completions response to create an additional
    "top_p_logprobs" list for each token. This new list is a subset of the existing "top_logprobs"
    that includes only those candidate tokens where the cumulative probability is below the
    nucleus sampling threshold (global_top_p), plus the candidate that causes the sum to meet or exceed it.
    
    If global_top_p is 1.0 or greater, the entire list of top_logprobs is copied.
    
    Additionally, if the available candidates (which may be limited by the API to MAX_TOP_LOGPROBS)
    do not sum up to global_top_p, a warning is printed:
         "warning: top_p_logprobs limited to 20 at token NN"
    where NN is the index of the token within the logprobs.content list.
    
    Args:
        response (dict): The chat completions response object from the API.
        global_top_p (float, optional): The desired nucleus sampling threshold. Defaults to TOP_P.
        
    Returns:
        None: The function modifies the response object in place.
    """
    # Iterate over each choice in the response.
    for choice in response.get("choices", []):
        logprobs_obj = choice.get("logprobs")
        if not logprobs_obj:
            continue
        content_list = logprobs_obj.get("content", [])
        for idx, token_obj in enumerate(content_list):
            if "top_logprobs" not in token_obj:
                continue  # Skip tokens that do not have top_logprobs information.
            original_candidates = token_obj["top_logprobs"]
            # If no nucleus sampling limitation, copy the full candidate list.
            if global_top_p >= 1.0:
                token_obj["top_p_logprobs"] = original_candidates.copy()
            else:
                cumulative_prob = 0.0
                selected_candidates = []
                for candidate in original_candidates:
                    # Compute the probability from the log probability.
                    candidate_prob = math.exp(candidate.get("logprob", float("-inf")))
                    # Include candidate if cumulative is still below threshold.
                    if cumulative_prob < global_top_p:
                        selected_candidates.append(candidate)
                        cumulative_prob += candidate_prob
                        # Even if this candidate causes the cumulative sum to meet or exceed the threshold,
                        # we include it and then stop.
                        if cumulative_prob >= global_top_p:
                            break
                    else:
                        break
                token_obj["top_p_logprobs"] = selected_candidates
                # If the sum of probabilities is less than global_top_p and the available
                # candidates reached the maximum limit, issue a warning.
                if (cumulative_prob < global_top_p) and (len(original_candidates) >= MAX_TOP_LOGPROBS):
                    print(f"warning: top_p_logprobs limited to {MAX_TOP_LOGPROBS} at token {idx}")

###############################################################################
# Demonstration: Sending a chat completion API call using httpx and transforming the response.
###############################################################################

# Step 1: Setup prompt and API parameters.
user_prompt = "Write the first line of a 'kitten haiku' as your only response"



# Step 2: Define messages (system and user) for the chat.
system_message = [
    {
        "type": "text",
        "text": "You are a helpful AI assistant",
    }
]
user_message = [
    {
        "type": "text",
        "text": user_prompt,
    }
]
messages = [
    {"role": "developer", "content": system_message},
    {"role": "user", "content": user_message},
]

# Step 3: Construct the API request body.
request_body = {
    "model": model,
    "messages": messages,
    "response_format": {"type": "text"},
    "stop": [],
    **({"max_completion_tokens": max_completion_tokens} if "max_completion_tokens" in locals() else {}),
    # Sampling and penalty parameters:
    "temperature": 1.0,
    "top_p": TOP_P,  # Use the global TOP_P variable.
    "logprobs": logprobs_enabled,
    "frequency_penalty": 0.0,
    "presence_penalty": 0.0,
}

if request_body.get("tools"):
    request_body["tool_choice"] = "auto"

# When logprobs are enabled, specify the maximum number of top logit candidates to return.
if logprobs_enabled:
    request_body["top_logprobs"] = MAX_TOP_LOGPROBS  # This value can be set up to MAX_TOP_LOGPROBS (20).

# Step 4: Send the API request and process the response.
openai_chat_endpoint = "https://api.openai.com/v1/chat/completions"
try:
    # Send HTTP POST request to the OpenAI API.
    response = httpx.post(
        openai_chat_endpoint,
        headers=_get_headers(),
        json=request_body,
        timeout=500.0,
    )
    response.raise_for_status()  # Raise an exception for HTTP errors.
    response_json = response.json()
    
    # Print out the model name and first choice response for demonstration.
    print("Model:")
    print(json.dumps(response_json.get("model", "Unknown model"), indent=2))
    #print("\nOriginal First Choice:")
    #print(json.dumps(response_json["choices"][0], indent=2))
    
    # Step 5: Augment the response object by adding "top_p_logprobs" for each token.
    augment_logprobs(response_json, TOP_P)
    
    # For demonstration, print out the modified logprobs object from the first choice.
    print("\nTransformed logprobs with top_p_logprobs:")
    print(json.dumps(response_json["choices"][0].get("logprobs", {}), indent=2))
    
except httpx.HTTPError as http_err:
    print(f"HTTP error occurred: {http_err}")
except Exception as e:
    print(f"An error occurred: {e}")

Here is the long formatted output of the new response object - just max_completion_tokens and its 700 lines, so this could bomb your console app if you actually printed a full request.

Model:
"gpt-4o-mini-2024-07-18"

Transformed logprobs with top_p_logprobs:
{
  "content": [
    {
      "token": "Soft",
      "logprob": -0.89273024,
      "bytes": [
        83,
        111,
        102,
        116
      ],
      "top_logprobs": [
        {
          "token": "Soft",
          "logprob": -0.89273024,
          "bytes": [
            83,
            111,
            102,
            116
          ]
        },
        {
          "token": "Wh",
          "logprob": -0.89273024,
          "bytes": [
            87,
            104
          ]
        },
        {
          "token": "Tiny",
          "logprob": -2.3927302,
          "bytes": [
            84,
            105,
            110,
            121
          ]
        },
        {
          "token": "P",
          "logprob": -2.6427302,
          "bytes": [
            80
          ]
        },
        {
          "token": "Gent",
          "logprob": -5.01773,
          "bytes": [
            71,
            101,
            110,
            116
          ]
        },
        {
          "token": "Fl",
          "logprob": -5.51773,
          "bytes": [
            70,
            108
          ]
        },
        {
          "token": "Play",
          "logprob": -5.76773,
          "bytes": [
            80,
            108,
            97,
            121
          ]
        },
        {
          "token": "Cur",
          "logprob": -6.64273,
          "bytes": [
            67,
            117,
            114
          ]
        },
        {
          "token": "Small",
          "logprob": -6.89273,
          "bytes": [
            83,
            109,
            97,
            108,
            108
          ]
        },
        {
          "token": "Little",
          "logprob": -7.26773,
          "bytes": [
            76,
            105,
            116,
            116,
            108,
            101
          ]
        },
        {
          "token": "F",
          "logprob": -7.64273,
          "bytes": [
            70
          ]
        },
        {
          "token": "Silent",
          "logprob": -8.517731,
          "bytes": [
            83,
            105,
            108,
            101,
            110,
            116
          ]
        },
        {
          "token": "In",
          "logprob": -8.642731,
          "bytes": [
            73,
            110
          ]
        },
        {
          "token": "A",
          "logprob": -8.892731,
          "bytes": [
            65
          ]
        },
        {
          "token": "Vel",
          "logprob": -9.392731,
          "bytes": [
            86,
            101,
            108
          ]
        },
        {
          "token": "B",
          "logprob": -9.892731,
          "bytes": [
            66
          ]
        },
        {
          "token": "C",
          "logprob": -10.017731,
          "bytes": [
            67
          ]
        },
        {
          "token": "Sun",
          "logprob": -10.017731,
          "bytes": [
            83,
            117,
            110
          ]
        },
        {
          "token": "T",
          "logprob": -10.267731,
          "bytes": [
            84
          ]
        },
        {
          "token": "Eyes",
          "logprob": -10.267731,
          "bytes": [
            69,
            121,
            101,
            115
          ]
        }
      ],
      "top_p_logprobs": [
        {
          "token": "Soft",
          "logprob": -0.89273024,
          "bytes": [
            83,
            111,
            102,
            116
          ]
        },
        {
          "token": "Wh",
          "logprob": -0.89273024,
          "bytes": [
            87,
            104
          ]
        }
      ]
    },
    {
      "token": " paws",
      "logprob": -0.03889219,
      "bytes": [
        32,
        112,
        97,
        119,
        115
      ],
      "top_logprobs": [
        {
          "token": " paws",
          "logprob": -0.03889219,
          "bytes": [
            32,
            112,
            97,
            119,
            115
          ]
        },
        {
          "token": " p",
          "logprob": -3.6638923,
          "bytes": [
            32,
            112
          ]
        },
        {
          "token": " paw",
          "logprob": -4.9138923,
          "bytes": [
            32,
            112,
            97,
            119
          ]
        },
        {
          "token": " whisk",
          "logprob": -5.5388923,
          "bytes": [
            32,
            119,
            104,
            105,
            115,
            107
          ]
        },
        {
          "token": " fur",
          "logprob": -7.4138923,
          "bytes": [
            32,
            102,
            117,
            114
          ]
        },
        {
          "token": " whispers",
          "logprob": -7.5388923,
          "bytes": [
            32,
            119,
            104,
            105,
            115,
            112,
            101,
            114,
            115
          ]
        },
        {
          "token": "ly",
          "logprob": -10.663892,
          "bytes": [
            108,
            121
          ]
        },
        {
          "token": " whisper",
          "logprob": -10.788892,
          "bytes": [
            32,
            119,
            104,
            105,
            115,
            112,
            101,
            114
          ]
        },
        {
          "token": "est",
          "logprob": -11.288892,
          "bytes": [
            101,
            115,
            116
          ]
        },
        {
          "token": " little",
          "logprob": -11.663892,
          "bytes": [
            32,
            108,
            105,
            116,
            116,
            108,
            101
          ]
        },
        {
          "token": " pads",
          "logprob": -11.913892,
          "bytes": [
            32,
            112,
            97,
            100,
            115
          ]
        },
        {
          "token": " pat",
          "logprob": -12.288892,
          "bytes": [
            32,
            112,
            97,
            116
          ]
        },
        {
          "token": ",",
          "logprob": -12.913892,
          "bytes": [
            44
          ]
        },
        {
          "token": " wh",
          "logprob": -13.038892,
          "bytes": [
            32,
            119,
            104
          ]
        },
        {
          "token": " tiny",
          "logprob": -13.163892,
          "bytes": [
            32,
            116,
            105,
            110,
            121
          ]
        },
        {
          "token": " kitten",
          "logprob": -13.163892,
          "bytes": [
            32,
            107,
            105,
            116,
            116,
            101,
            110
          ]
        },
        {
          "token": " eyes",
          "logprob": -13.288892,
          "bytes": [
            32,
            101,
            121,
            101,
            115
          ]
        },
        {
          "token": " velvet",
          "logprob": -13.288892,
          "bytes": [
            32,
            118,
            101,
            108,
            118,
            101,
            116
          ]
        },
        {
          "token": " P",
          "logprob": -13.413892,
          "bytes": [
            32,
            80
          ]
        },
        {
          "token": " steps",
          "logprob": -13.413892,
          "bytes": [
            32,
            115,
            116,
            101,
            112,
            115
          ]
        }
      ],
      "top_p_logprobs": [
        {
          "token": " paws",
          "logprob": -0.03889219,
          "bytes": [
            32,
            112,
            97,
            119,
            115
          ]
        }
      ]
    },
    {
      "token": " on",
      "logprob": -1.7707677,
      "bytes": [
        32,
        111,
        110
      ],
      "top_logprobs": [
        {
          "token": " tread",
          "logprob": -1.1457677,
          "bytes": [
            32,
            116,
            114,
            101,
            97,
            100
          ]
        },
        {
          "token": " on",
          "logprob": -1.7707677,
          "bytes": [
            32,
            111,
            110
          ]
        },
        {
          "token": " dance",
          "logprob": -2.2707677,
          "bytes": [
            32,
            100,
            97,
            110,
            99,
            101
          ]
        },
        {
          "token": " in",
          "logprob": -2.3957677,
          "bytes": [
            32,
            105,
            110
          ]
        },
        {
          "token": " p",
          "logprob": -2.6457677,
          "bytes": [
            32,
            112
          ]
        },
        {
          "token": " tip",
          "logprob": -2.6457677,
          "bytes": [
            32,
            116,
            105,
            112
          ]
        },
        {
          "token": " pad",
          "logprob": -2.7707677,
          "bytes": [
            32,
            112,
            97,
            100
          ]
        },
        {
          "token": " pat",
          "logprob": -3.6457677,
          "bytes": [
            32,
            112,
            97,
            116
          ]
        },
        {
          "token": " gently",
          "logprob": -4.0207677,
          "bytes": [
            32,
            103,
            101,
            110,
            116,
            108,
            121
          ]
        },
        {
          "token": " whisper",
          "logprob": -4.5207677,
          "bytes": [
            32,
            119,
            104,
            105,
            115,
            112,
            101,
            114
          ]
        },
        {
          "token": " touch",
          "logprob": -4.7707677,
          "bytes": [
            32,
            116,
            111,
            117,
            99,
            104
          ]
        },
        {
          "token": " softly",
          "logprob": -4.8957677,
          "bytes": [
            32,
            115,
            111,
            102,
            116,
            108,
            121
          ]
        },
        {
          "token": " tap",
          "logprob": -5.1457677,
          "bytes": [
            32,
            116,
            97,
            112
          ]
        },
        {
          "token": " kne",
          "logprob": -5.1457677,
          "bytes": [
            32,
            107,
            110,
            101
          ]
        },
        {
          "token": " creep",
          "logprob": -5.6457677,
          "bytes": [
            32,
            99,
            114,
            101,
            101,
            112
          ]
        },
        {
          "token": " grace",
          "logprob": -6.0207677,
          "bytes": [
            32,
            103,
            114,
            97,
            99,
            101
          ]
        },
        {
          "token": " stretch",
          "logprob": -6.1457677,
          "bytes": [
            32,
            115,
            116,
            114,
            101,
            116,
            99,
            104
          ]
        },
        {
          "token": " pr",
          "logprob": -6.3957677,
          "bytes": [
            32,
            112,
            114
          ]
        },
        {
          "token": " press",
          "logprob": -6.3957677,
          "bytes": [
            32,
            112,
            114,
            101,
            115,
            115
          ]
        },
        {
          "token": " chase",
          "logprob": -6.3957677,
          "bytes": [
            32,
            99,
            104,
            97,
            115,
            101
          ]
        }
      ],
      "top_p_logprobs": [
        {
          "token": " tread",
          "logprob": -1.1457677,
          "bytes": [
            32,
            116,
            114,
            101,
            97,
            100
          ]
        },
        {
          "token": " on",
          "logprob": -1.7707677,
          "bytes": [
            32,
            111,
            110
          ]
        },
        {
          "token": " dance",
          "logprob": -2.2707677,
          "bytes": [
            32,
            100,
            97,
            110,
            99,
            101
          ]
        }
      ]
    }
  ],
  "refusal": null
}

You will see that we’ve achieved the goal: while each top_logprobs has 20, the top_p_logprobs that were filtered give us 2, 1, 3 results for our max_completion_tokens = 3 “kitten haiku”.

The function should be reusable with just the one demonstration line utilizing it, performed on a response dictionary as Python object (a Python openai SDK library result will need to be transformed with response.model_dump())

You can also re-normalize each result list - so that it again demonstrates the 1.0 probability space that might have been employed.

Topic		Replies	Views
Top K Log Probabilities are not aligned with actual response token API logprobs	3	161	March 24, 2025
Questions regarding API sampling parameters (temperature, top_p) API gpt-35-turbo , api	18	9870	July 16, 2024
Logprob value is unbounded, i used sigmoid (converting API logarithm to probabilities) API	5	1674	November 12, 2023
How do I design an effective question? API	13	753	February 25, 2024
Logprobs inconsistent between runs for 4o API logprobs	4	1371	September 11, 2024

Is there a way to receive the top_p probabilities?

Related topics