Irrelevant Top Log Probabilities in openai.Completions

diliprk · January 9, 2023, 11:46am

Hi,
Earlier when I was using the deprecated openai.answers.create endpoint, I was able to get the top two log probs like this {' No': -1.2676435, ' Yes': -0.37322283} when validating binary Yes or No questions.

Now I have switched to the new openAI.completions endpoint and provide the following custom engineered prompt, like below:

full_prompt = "
Instruction: Please give only Yes or No answers to the following question based on the given context. If context is empty return 'NA'.

===
Context: [<some sentence which contains some information about X and Y>]
===
Q: Is X also known as Y?
A:"

completion_result = openai.Completion.create(
        engine= "text-davinci-003",
        prompt = full_prompt,
        logit_bias = None,
        temperature=0.01,
        n = 1,
        max_tokens = 2,
        stop = None,
        logprobs = 2,
    )

But when I run the openAI.completions API call I get irrelevant \ incorrect log probs like:

{' Yes': -0.0037726616, 'Yes': -6.203214}
{'\n': -1.1957816, ' No': -0.67463875}

I just want log probs for only Yes or No instead of Yes occurring two times or with stop words like \n.

How do I ensure that I get proper log props with only Yes or No in the openai.completions API response?

Best Regards,
Dilip

Dent · January 9, 2023, 7:35pm

I’d recommend adding \s after A: to eliminate the whitespace problem altogether. It’s seeing that Q: has a whitespace after it, so it’s including a whitespace-led variation of the answer in the probabilities

diliprk · January 10, 2023, 8:45am

Hi Dent,
Thank you taking the time to reply, I don’t see the \n in the top_log_probs field in the API response now , but still the duplicate responses or tokens ( Yes and Yes ) in the top_logprobs field still prevails.

I have constructed my problem in this code below, which you can directly run in a Jupyter notebook on your python virtual environment (where openAI is installed) and API key is configured.

import openai
import pandas as pd
from pprint import pprint
from ast import literal_eval

url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vTGuONBhr1yefCz619MugO9aT2ATxWcvsL41W4ZLYrIBCjdeHPXJthX8OMVrCnTpSHF6T4Cv_ujkP86/pub?gid=1605389139&single=true&output=csv"
df = pd.read_csv(url)
df['documents'] = df['documents'].apply(literal_eval) ## to read documents column as a list
df['nr_docs_final'] = df['documents'].apply(len)
df['documents'] = df['documents'].astype(str)
df['questions'] = df.apply(lambda x : f"Is {x.Pref_Name} also known as {x.Alternate_Name}?", axis = 1)


ANSWERS_INSTRUCTION = "Please give only Yes or No answers to the following question based on the given Context. If Context is empty return NA"
CONTEXT_TEMPLATE = "===\nContext: {context}\n===\n"


def answers(question, model, documents=None, logit_bias=None, 
            max_rerank=200, max_tokens=2, print_response_flag = False,
            temperature=0.01, top_logprobs=2, stop=None, n=1 ):
    """
    Given a prompt, answer the question according to the given instruction
    """
    prompt = f"Q:{question}\nA:"
    logit_bias = logit_bias if logit_bias is not None else {}
    instruction = f"Instruction: {ANSWERS_INSTRUCTION.strip()}\n\n" if ANSWERS_INSTRUCTION != None else ""
    context = CONTEXT_TEMPLATE.format(context = documents)    
    full_prompt = instruction + context + prompt
    # print("PROMPT:\n",full_prompt)
    
    ## Call openai.Completions API
    completion_result = openai.Completion.create(
        engine=model,
        prompt=full_prompt,
        logit_bias=logit_bias,
        temperature=temperature,
        n=n,
        max_tokens=max_tokens,
        stop=stop,
        logprobs=top_logprobs,
    )
    
    if print_response_flag:
        print("COMPLETION API RAW RESPONSE:") 
        pprint(completion_result)
        
    top_prob_answers = completion_result["choices"][0]['logprobs']["top_logprobs"][0]
    result = dict(
        object="answer",
        completion=completion_result["id"],
    )
    
    result["top_logprobs"] = dict(top_prob_answers)
    
    result["answers"] = [
        item["text"].replace("A:", "").split("Q:")[0].strip()
        for item in completion_result["choices"]]
    return result

## PRINT a SAMPLE Test Point
index = 0
qstn = df['questions'].iloc[index]
docs = df['documents'].iloc[index]
response = answers(question = qstn, model = "text-davinci-003", max_tokens = 2, documents = docs,
                    print_response_flag = True, temperature=0.01, top_logprobs = 2, stop = None, n = 1)

print("FINAL RESPONSE:")
pprint(response)


## Get Responses for entire Column
df['openai_response'] = df.apply(lambda x : answers(question = x.questions, model = "text-davinci-003", documents = x.documents), axis = 1)

df['answers'] = df['openai_response'].apply(lambda x : x['answers'][0])
df['top_logprobs'] = df['openai_response'].apply(lambda x : x['top_logprobs'])
print(df)

Any help in removing the redundant tokens in the Completions API top_logprobs responses will be greatly appreciated

Best Regards,
Dilip

Dent · January 10, 2023, 9:10am

I’m too dumb and tired to get it up and running rn. I gave up after 3 dependency errors

This is still missing \s after A:

It should look like this:

prompt = f"Q:{question}\nA:\s"
or
prompt = f"Q:{question}\nA: "

Regardless, what is the exact print output of the pprint(response) call? In your previous post, it looked like two dicts, but I’m not familiar enough with pprint to know how it formats multi-line dicts. If it is just one, you should be able to do del response[‘\n’] (though that’s mutating data, which I loathe)

diliprk · January 10, 2023, 9:21am

Your suggestion of adding a \s to the prompt prompt = f"Q:{question}\nA:\s" is not making any difference, in the output. You just have to install pandas and openai (ignore GPT2TokenizerFast I have removed that in previous post).

Anyways, if setting up openai on your machine is difficult, here is the RAW Response:

COMPLETION API RAW RESPONSE:
{'choices': [{'finish_reason': 'stop',
              'index': 0,
              'logprobs': {'text_offset': [617, 620],
                           'token_logprobs': [-0.00767172, -0.00018343095],
                           'tokens': ['Yes', '<|endoftext|>'],
                           'top_logprobs': [{' Yes': -4.965048,
                                             'Yes': -0.00767172},
                                            {'<|endoftext|>': -0.00018343095,
                                             '\\': -9.280664}]},
              'text': 'Yes'}],
 'created': 1673342177,
 'id': 'cmpl-6X51tqUaYmnBA5p7EA5Fui0ZZkQzw',
 'model': 'text-davinci-003',
 'object': 'text_completion',
 'usage': {'completion_tokens': 1,
           'prompt_tokens': 148,
           'total_tokens': 149}}

from which I extract the following in to the respective columns:

{'answers': ['Yes'],
 'top_logprobs': {' Yes': -4.965048, 'Yes': -0.00767172}}

Dent · January 10, 2023, 11:23am

Oh. Gross. Here I thought I was being clever with string manipulation.

If nothing else, you’re gonna hafta do math at those to combine the values, and I’m a bit (read: very) rusty on how to operate on logprobs. You certainly can’t just drop one of the Yess after the calculation.

Hopefully someone else is more familiar, bc I don’t want to risk giving you incorrect info.

I’m still confused about why it’s still trying to add a space char when there already is one…

raymonddavey · January 10, 2023, 5:58pm

If you want exact outputs like the old endpoint, you are probably going to need to use the embedding classification method instead of just relying on GPT

It will mean a bit more work at your end, but you look like you are keen to get it working.

Let me know if you cant figure out how the classifiers in embedding work. I may be able to help - but not for a couple of days (snowed under right now)

diliprk · January 11, 2023, 6:16am

Hi @raymonddavey ,
Thanks for your feedback and your offer to help. In the toy problem (dataset) which I shared above there is a GroundTruth column (Labelled data) to do a classification task but in the real-world problem that I am using this, there is NO GroundTruth column (labelled data), so we have to use openAI to get the labelled response. Anyways, thanks again I will continue with this completions approach

maverickPramit · April 28, 2023, 7:03pm

Post is old, but sharing a response to computing probability score for classification use-case in-case helpful,
#Reference: Getting the Most Out of GPT-3-based Text Classifiers: Part 3 | Label Probabilities and Multi-Label Output | Edge Analytics

def prob_for_label(label: str, logprobs) → float:
“”"
Returns the predicted probability for the given label between 0.0 and 1.0.
“”"
# Initialize probability for this label to zero.
prob = 0.0
# Look at the first entry in logprobs. This represents the
# probabilities for the very next token.
next_logprobs = logprobs[0]
for s, logprob in next_logprobs.items():
# We want labels to be considered case-insensitive. In
# other words:
#
# prob_for_label(“vegetable”) =
# prob(“vegetable”) + prob(“Vegetable”)
#
s = s.lower().strip()
if label.lower() == s:
# If the prediction matches one of the labels, add
# the probability to the total probability for that
# label.
prob += logprob_to_prob(logprob)
elif label.lower().startswith(s):
# If the prediction is a prefix of one of the labels, we
# need to recur. Multiply the probability of the prefix
# by the probability of the remaining part of the label.
# In other words:
#
# prob_for_label(“vegetable”) =
# prob(“vege”) * prob(“table”)
#
rest_of_label = label[len(s) :]
remaining_logprobs = logprobs[1:]
prob += logprob * prob_for_label(
rest_of_label,
remaining_logprobs,
)
return prob

Topic		Replies	Views
Zero shot classification with OpenAI - response bias towards first label? API api	11	2365	August 6, 2023
Interpret scores generated by GPT4 Vision in multilabel classification API gpt-4 , gpt-4-vision	3	889	May 1, 2024
How do logprobs work for chat completion API (for GPT-4.1) API gpt-4	1	398	September 11, 2025
Classification with generative models Prompting gpt-4 , chatgpt	4	5224	January 25, 2024
Logprobs for specific tokens, not just top tokens API api	7	2259	June 23, 2025

Irrelevant Top Log Probabilities in openai.Completions

Related topics