Irrelevant Top Log Probabilities in openai.Completions

Hi,
Earlier when I was using the deprecated openai.answers.create endpoint, I was able to get the top two log probs like this {' No': -1.2676435, ' Yes': -0.37322283} when validating binary Yes or No questions.

Now I have switched to the new openAI.completions endpoint and provide the following custom engineered prompt, like below:

full_prompt = "
Instruction: Please give only Yes or No answers to the following question based on the given context. If context is empty return 'NA'.

===
Context: [<some sentence which contains some information about X and Y>]
===
Q: Is X also known as Y?
A:"

completion_result = openai.Completion.create(
        engine= "text-davinci-003",
        prompt = full_prompt,
        logit_bias = None,
        temperature=0.01,
        n = 1,
        max_tokens = 2,
        stop = None,
        logprobs = 2,
    )

But when I run the openAI.completions API call I get irrelevant \ incorrect log probs like:

{' Yes': -0.0037726616, 'Yes': -6.203214}
{'\n': -1.1957816, ' No': -0.67463875}

I just want log probs for only Yes or No instead of Yes occurring two times or with stop words like \n.

How do I ensure that I get proper log props with only Yes or No in the openai.completions API response?

Best Regards,
Dilip

I’d recommend adding \s after A: to eliminate the whitespace problem altogether. It’s seeing that Q: has a whitespace after it, so it’s including a whitespace-led variation of the answer in the probabilities

2 Likes

Hi Dent,
Thank you taking the time to reply, I don’t see the \n in the top_log_probs field in the API response now , but still the duplicate responses or tokens ( Yes and Yes ) in the top_logprobs field still prevails.

I have constructed my problem in this code below, which you can directly run in a Jupyter notebook on your python virtual environment (where openAI is installed) and API key is configured.

import openai
import pandas as pd
from pprint import pprint
from ast import literal_eval

url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vTGuONBhr1yefCz619MugO9aT2ATxWcvsL41W4ZLYrIBCjdeHPXJthX8OMVrCnTpSHF6T4Cv_ujkP86/pub?gid=1605389139&single=true&output=csv"
df = pd.read_csv(url)
df['documents'] = df['documents'].apply(literal_eval) ## to read documents column as a list
df['nr_docs_final'] = df['documents'].apply(len)
df['documents'] = df['documents'].astype(str)
df['questions'] = df.apply(lambda x : f"Is {x.Pref_Name} also known as {x.Alternate_Name}?", axis = 1)


ANSWERS_INSTRUCTION = "Please give only Yes or No answers to the following question based on the given Context. If Context is empty return NA"
CONTEXT_TEMPLATE = "===\nContext: {context}\n===\n"


def answers(question, model, documents=None, logit_bias=None, 
            max_rerank=200, max_tokens=2, print_response_flag = False,
            temperature=0.01, top_logprobs=2, stop=None, n=1 ):
    """
    Given a prompt, answer the question according to the given instruction
    """
    prompt = f"Q:{question}\nA:"
    logit_bias = logit_bias if logit_bias is not None else {}
    instruction = f"Instruction: {ANSWERS_INSTRUCTION.strip()}\n\n" if ANSWERS_INSTRUCTION != None else ""
    context = CONTEXT_TEMPLATE.format(context = documents)    
    full_prompt = instruction + context + prompt
    # print("PROMPT:\n",full_prompt)
    
    ## Call openai.Completions API
    completion_result = openai.Completion.create(
        engine=model,
        prompt=full_prompt,
        logit_bias=logit_bias,
        temperature=temperature,
        n=n,
        max_tokens=max_tokens,
        stop=stop,
        logprobs=top_logprobs,
    )
    
    if print_response_flag:
        print("COMPLETION API RAW RESPONSE:") 
        pprint(completion_result)
        
    top_prob_answers = completion_result["choices"][0]['logprobs']["top_logprobs"][0]
    result = dict(
        object="answer",
        completion=completion_result["id"],
    )
    
    result["top_logprobs"] = dict(top_prob_answers)
    
    result["answers"] = [
        item["text"].replace("A:", "").split("Q:")[0].strip()
        for item in completion_result["choices"]]
    return result

## PRINT a SAMPLE Test Point
index = 0
qstn = df['questions'].iloc[index]
docs = df['documents'].iloc[index]
response = answers(question = qstn, model = "text-davinci-003", max_tokens = 2, documents = docs,
                    print_response_flag = True, temperature=0.01, top_logprobs = 2, stop = None, n = 1)

print("FINAL RESPONSE:")
pprint(response)


## Get Responses for entire Column
df['openai_response'] = df.apply(lambda x : answers(question = x.questions, model = "text-davinci-003", documents = x.documents), axis = 1)

df['answers'] = df['openai_response'].apply(lambda x : x['answers'][0])
df['top_logprobs'] = df['openai_response'].apply(lambda x : x['top_logprobs'])
print(df)

Any help in removing the redundant tokens in the Completions API top_logprobs responses will be greatly appreciated :pray:

Best Regards,
Dilip

I’m too dumb and tired to get it up and running rn. I gave up after 3 dependency errors

This is still missing \s after A:

It should look like this:

prompt = f"Q:{question}\nA:\s"
or
prompt = f"Q:{question}\nA: "

Regardless, what is the exact print output of the pprint(response) call? In your previous post, it looked like two dicts, but I’m not familiar enough with pprint to know how it formats multi-line dicts. If it is just one, you should be able to do del response[ā€˜\n’] (though that’s mutating data, which I loathe)

1 Like

Your suggestion of adding a \s to the prompt prompt = f"Q:{question}\nA:\s" is not making any difference, in the output. You just have to install pandas and openai (ignore GPT2TokenizerFast I have removed that in previous post).

Anyways, if setting up openai on your machine is difficult, here is the RAW Response:

COMPLETION API RAW RESPONSE:
{'choices': [{'finish_reason': 'stop',
              'index': 0,
              'logprobs': {'text_offset': [617, 620],
                           'token_logprobs': [-0.00767172, -0.00018343095],
                           'tokens': ['Yes', '<|endoftext|>'],
                           'top_logprobs': [{' Yes': -4.965048,
                                             'Yes': -0.00767172},
                                            {'<|endoftext|>': -0.00018343095,
                                             '\\': -9.280664}]},
              'text': 'Yes'}],
 'created': 1673342177,
 'id': 'cmpl-6X51tqUaYmnBA5p7EA5Fui0ZZkQzw',
 'model': 'text-davinci-003',
 'object': 'text_completion',
 'usage': {'completion_tokens': 1,
           'prompt_tokens': 148,
           'total_tokens': 149}}

from which I extract the following in to the respective columns:

{'answers': ['Yes'],
 'top_logprobs': {' Yes': -4.965048, 'Yes': -0.00767172}}

Oh. Gross. Here I thought I was being clever with string manipulation.

If nothing else, you’re gonna hafta do math at those to combine the values, and I’m a bit (read: very) rusty on how to operate on logprobs. You certainly can’t just drop one of the Yess after the calculation.

Hopefully someone else is more familiar, bc I don’t want to risk giving you incorrect info.

I’m still confused about why it’s still trying to add a space char when there already is one… :face_with_spiral_eyes:

If you want exact outputs like the old endpoint, you are probably going to need to use the embedding classification method instead of just relying on GPT

It will mean a bit more work at your end, but you look like you are keen to get it working.

Let me know if you cant figure out how the classifiers in embedding work. I may be able to help - but not for a couple of days (snowed under right now)

1 Like

Hi @raymonddavey ,
Thanks for your feedback and your offer to help. In the toy problem (dataset) which I shared above there is a GroundTruth column (Labelled data) to do a classification task but in the real-world problem that I am using this, there is NO GroundTruth column (labelled data), so we have to use openAI to get the labelled response. Anyways, thanks again I will continue with this completions approach

Post is old, but sharing a response to computing probability score for classification use-case in-case helpful,
#Reference: Getting the Most Out of GPT-3-based Text Classifiers: Part 3 | Label Probabilities and Multi-Label Output | Edge Analytics

def prob_for_label(label: str, logprobs) → float:
ā€œā€"
Returns the predicted probability for the given label between 0.0 and 1.0.
ā€œā€"
# Initialize probability for this label to zero.
prob = 0.0
# Look at the first entry in logprobs. This represents the
# probabilities for the very next token.
next_logprobs = logprobs[0]
for s, logprob in next_logprobs.items():
# We want labels to be considered case-insensitive. In
# other words:
#
# prob_for_label(ā€œvegetableā€) =
# prob(ā€œvegetableā€) + prob(ā€œVegetableā€)
#
s = s.lower().strip()
if label.lower() == s:
# If the prediction matches one of the labels, add
# the probability to the total probability for that
# label.
prob += logprob_to_prob(logprob)
elif label.lower().startswith(s):
# If the prediction is a prefix of one of the labels, we
# need to recur. Multiply the probability of the prefix
# by the probability of the remaining part of the label.
# In other words:
#
# prob_for_label(ā€œvegetableā€) =
# prob(ā€œvegeā€) * prob(ā€œtableā€)
#
rest_of_label = label[len(s) :]
remaining_logprobs = logprobs[1:]
prob += logprob * prob_for_label(
rest_of_label,
remaining_logprobs,
)
return prob