Clarifications for `openai.answers` - Replacement / Transition Guide

Hi,

Earlier I was using the open.answers.create API end point to get binary Yes/No answers to some questions that validate alternative names. Please see the code block below to reproduce my example case.

################
"""
A Set of wrapper functions built around the OpenAI API functions.
Source: https://beta.openai.com/docs/api-reference/introduction
Author: Dilip Rajkumar
"""
################
import os
import ast
import json
import openai
import logging
import pandas as pd
from typing import List, Union, Optional


## GLOBAL VARIABLES
# openai.api_key = os.getenv("OPENAI_API_KEY")
# default_user = "<INSERT USER ID>"


## SETUP LOGGER
logging.basicConfig(filename='../OpenAI_QnA_Query.log', filemode='a', format='%(asctime)s - %(funcName)s - %(levelname)s - %(message)s', level=logging.INFO)
log = logging.getLogger(__name__)


## ANSWERS
def get_answers_openai(qstn : str, docs : list = [], search_model : str = "ada", 
                       model : str = "curie", temperature : float = 0.01, 
                       examples : List = None, example_context : str = "", log_probs : int = None,
                       stop : Union[list, str] =  ["\n", "<|endoftext|>"]) -> tuple:
    """
    Answers the specified question using the provided documents and examples.
    
    Parameters
    ==========
    Refer: https://beta.openai.com/docs/api-reference/answers/create
    
    Returns
    =======
    tuple     :  (list,dict) answers from the model response, and the complete model response.
    """
    response = openai.Answer.create(search_model= search_model, model= model, 
                                    question = qstn, documents=docs, 
                                    logprobs = log_probs, examples_context=example_context,
                                    examples=[examples],temperature = temperature, 
                                    max_tokens=5, stop=stop)
    answers = response["answers"]
    return answers, response


def get_openai_batch_answers(df : pd.DataFrame, qstn_middle_phrase : str = "also known as",
                            ex_context : str = "", ex_qstn : str = " ", ex_answer : str = " ") -> dict:
    """
    Call the `get_answers_openai` method to get batch responses
    """
    qna_dict = {"PrefName":[], "AlternateName" :[], "questions" : [], "GroundTruth":[] ,"answers" : [], "top_prob_ans" : []}
    for i in range(len(df)):
        pref_name = df.Pref_Name.iloc[i]
        synonym = df.Alternate_Name.iloc[i]
        nr_docs = df.nr_docs_final.iloc[i]
        ground_truth = df.GroundTruth.iloc[i]
        question = f"Is {pref_name} {qstn_middle_phrase} {synonym}?"
        
        qna_dict["PrefName"].append(pref_name)
        qna_dict["AlternateName"].append(synonym)
        qna_dict["GroundTruth"].append(ground_truth)
        qna_dict["questions"].append(question)
        
        if nr_docs != 0:
            doc = str(df.documents.iloc[i])
        else:
            doc = ''
        answers, response = get_answers_openai(question, docs = [doc], model = 'davinci', examples = [ex_qstn, ex_answer], example_context = ex_context, log_probs = 2)
        top_probs= dict(response['completion']["choices"][0]['logprobs']['top_logprobs'][0])
        if "answers" in list(response.keys()):
            success_message = f"Response received for row index: {i} having pref_name:- {pref_name} and synonym:- {synonym}"
            # print(success_message)
            log.info(success_message)
        else:
            error_msg = f"ERROR in response for row index: {i} having pref_name:- {pref_name} and synonym:- {synonym}"
            print(error_msg)
            log.error(error_msg)
        
        qna_dict["answers"].append(answers[0])
        qna_dict["top_prob_ans"].append(top_probs)
    return pd.DataFrame(qna_dict)

### Example Questions and Context to `Get Answers` for Steering Responses
sample_context = "Common Salt (also known as Sodium Chloride) is a mineral abundantly found on the earth's surface."
sample_qstn    = "Is sodium chloride also known as common salt?"
sample_answer  = "Yes"

if __name__ == "__main__":
    url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vTGuONBhr1yefCz619MugO9aT2ATxWcvsL41W4ZLYrIBCjdeHPXJthX8OMVrCnTpSHF6T4Cv_ujkP86/pub?gid=0&single=true&output=csv"
    df = pd.read_csv(url)
    df_response = get_openai_batch_answers(df, 
                                           ex_context = sample_context, 
                                           ex_qstn = sample_qstn, 
                                           ex_answer = sample_answer
                                          )
    df_response.to_csv("OpenAI_Sample_QnA_responses.csv", index = False)
    print(df_response)

Input Dataframe:

In the input data frame file the documents column contains a list of sentences which is used to “train” the openAI model.

I am now trying to transition to Option 2 in your answers transition guide 1 . Could anyone kindly clarify the following questions and provide some guidance in transitioning from this deprecated API to the completions or something similar which will give me the desired output (see last question in this post)?

1.) The sentences I feed in the documents column for each data point is already filtered and cleaned to some extent, so is openai.search.create needed for my QnA requirement? If so, can you point to the correct transition guide for search replacement, because the current link in your transition guide seems to point to a wrong intercom app website (see below)

Do an OpenAI search (note that this is also being deprecated and has a transition guide where the documents are the user provided documents and the query is the query from above. Rank the documents by score.


I also have some questions regarding the def answers() function in the option 2 - python script provided in your transition guide:

2.) What is this examples_context argument ? Is it like some kind of title or label for what the QnA bot is supposed to do or does it has some important significance other than a label/title?

examples_context="I am a bot that names country capitals"

3.) Can I pass a list of sentences in this documents argument as I currently do in my present implementation using the openai.answers API endpoint? What can be the maximum nr. of sentences in this list, and how much word count per sentence or in total for this documents argument?

documents=["I am a bot that names country capitals"]

4.) What is this alternative_question argument? What am I supposed to feed in here?

alternative_question="different test"

5.) The API documentation for completions mentions that a maximum total of 2048 tokens, can be fed into the API request (I am assuming via the documents argument). So what is this max_tokens=16 argument doing? it is like some kind of window length for the OpenAI model to traverse through in chunks?

6.) If I just provide one example (positive YES case) for the example context, example question and answer, is it biasing all the openAI responses to YES

### Example Questions and Context to `Get Answers` for Steering Responses
sample_context = "Common Salt (also known as Sodium Chloride) is a mineral abundantly found on the earth's surface."
sample_qstn = "Is sodium chloride also known as common salt?"
sample_answer = "Yes"

7.) After executing the above python script we get the following data frame output. How do I improve the OpenAI predicted answers to be more accurate to match the ground truth column?

          PrefName           AlternateName                                          questions GroundTruth answers                               top_prob_ans
0              Fox           Vulpes vulpes                Is Fox also known as Vulpes vulpes?         Yes     Yes    {' No': -1.4927609, ' Yes': -0.2767575}
1           Bleach     Sodium Hypochlorite       Is Bleach also known as Sodium Hypochlorite?         Yes     Yes    {' No': -1.6149805, ' Yes': -0.2529657}
2     Blue Vitriol         Copper Sulphate     Is Blue Vitriol also known as Copper Sulphate?         Yes     Yes    {' No': -1.2375259, ' Yes': -0.3838043}
3        Wild boar              Sus scrofa             Is Wild boar also known as Sus scrofa?         Yes     Yes   {' No': -1.8056873, ' Yes': -0.20204972}
4           Bamboo     Phyllostachys aurea       Is Bamboo also known as Phyllostachys aurea?         Yes     Yes    {' No': -0.8268861, ' Yes': -0.6317986}
5          Cheetah     Catopuma temminckii      Is Cheetah also known as Catopuma temminckii?          No      No  {' No': -0.70515376, ' Yes': -0.73280275}
6        Black rat   Oryctolagus cuniculus  Is Black rat also known as Oryctolagus cuniculus?          No     Yes   {' No': -1.6201084, ' Yes': -0.24748868}
7          Calomel         Silicon Carbide          Is Calomel also known as Silicon Carbide?          No     Yes   {' No': -0.7262988, ' Yes': -0.72919035}
8    African daisy  Sansevieria cylindrica  Is African daisy also known as Sansevieria cyl...          No     Yes    {' No': -1.2350233, ' Yes': -0.3873096}
9  Cream of Tartar      Sodium Bicarbonate  Is Cream of Tartar also known as Sodium Bicarb...          No     Yes    {' No': -1.085476, ' Yes': -0.44409192}

I would also greatly appreciate any guidance/ suggestions to my code in making the transition from the deprecated answers endpoint to the new completions endpoint?

Best Regards,
Dilip

Hi! Lemme see if I can answer those in order:

  1. Search transition guide (Search Transition Guide | OpenAI Help Center) → " Option 2: Reimplement existing functionality" → Should point to this snippet: https://github.com/openai/openai-cookbook/blob/fd4e31bb000d4e86d5251d979ccff285160bc32d/transition_guides_for_deprecated_API_endpoints/search_functionality_example.py. That should be a near recreation of the existing search endpoint
  2. It’s mostly a title. Prompting at the beginning to tell the model what it’s supposed to be helps a lot with getting expected results.
  3. Should, yes! Maximum number of sentences is entirely bounded by the context length which for most models is 2048 tokens. If you’re curious about how to count the tokens, the just released GitHub - openai/tiktoken: tiktoken is a fast BPE tokeniser for use with OpenAI's models. library might help
  4. We used alternative_question as a way for users to provide a different string to do openai.search instead of the question posed in the completion text because sometimes you want those two things to be different. You don’t have to use it.
  5. max_tokens in this case represents the maximum number of tokens to generate from the completions request.
  6. Yep! Though we’d encourage some experimentation here. We usually recommend at least 1 example of each class you’re choosing between
  7. At a first glance, you probably want some negative examples in your prompt. Right now the model things that the answer to everything is “Yes”. I would recommend giving examples that include some "No"s or even some "I dont know"s.

Hope this helps!

1 Like

Hi Hallacy,
Many Thanks for your replies.

1.) My first question earlier remains partially unanswered.

The sentences I feed in the documents column for each data point is already filtered and cleaned to some extent using a custom data engineering, so is openai.search.create needed for my QnA requirement?

2.) I looked at the new search transition guide and I see it is also using the completions.create API end point which is again used in the new answers transitions guide. Is there anyway the openAI staff can optimise the code in the def semantic_search() function in the new answers transition guide so that it incorporates the new Search transition guide and makes an optimised call to the completions.create API end point (ideally once) in order to save costs?

I used the new transition guide to answers see below code:
openai_answers_new.py

import openai
import logging
import pandas as pd
from pprint import pprint
from transformers import GPT2TokenizerFast

tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")

MAX_TOKENS_LIMIT = 2048
ANSWERS_INSTRUCTION = "Please answer the question according to the above context.\n"
CONTEXT_TEMPLATE = "===\nContext: {context}\n===\n"

## SETUP LOGGER
logging.basicConfig(filename='../OpenAI_QnA_Query.log', filemode='a', format='%(asctime)s - %(funcName)s - %(levelname)s - %(message)s', level=logging.INFO)
log = logging.getLogger(__name__)

def extract_instruction(instruction):
    """
    Extract `instruction` parameter and format it properly.
    If not exist, return empty string.
    """
    if instruction is None:
        return ""

    return f"{instruction.strip()}\n\n"


def semantic_search(
    search_model, query_for_search, file_id=None, max_documents=None, examples=None
):
    """
    :param examples: A list of {"text":...} or {"text": ..., "label": ...}.
    :return:
        a list of semantic search result dict of documents sorted by "score":
        [
            {
                "document": ...,
                "object": "search_result",
                "score": ...,
                "text": ...,
            },
            ...
        ]
    """
    assert (examples is None) ^ (file_id is None)  # xor

    if file_id is not None:
        # This is where you'd do an elastic search call.  Since there isn't an example of this
        # we can query, we'll raise an error.
        # The return value from this would be a list of examples
        raise NotImplementedError()

    # This isn't quite accurate since Search is also being deprecated. See our search guide for more
    # information.

    search_result = openai.Search.create(
        model=search_model,
        documents=[x["text"] for x in examples],
        query=query_for_search,
    )

    info_dict = {d["document"]: d for d in search_result["data"]}
    sorted_doc_ids = sorted(
        info_dict.keys(), key=lambda x: info_dict[x]["score"], reverse=True
    )
    if max_documents:
        sorted_doc_ids = sorted_doc_ids[:max_documents]
    return [info_dict[i] for i in sorted_doc_ids]


def select_by_length(
    sorted_doc_infos,
    max_token_len,
    lambda_fn=None,
):
    """
    Give a list of (document ID, document content in string), we will select as many
    documents as possible as long as the total length does not go above `max_token_len`.
    :param sorted_doc_infos: A list of semantic search result dict of documents sorted by "score".
    :param max_token_len: The maximum token length for selected documents.
    :param lambda_fn: A function that takes in search results dict and output a formatted
        example for context stuffing.
    :return: A tuple of (
        A concatenation of selected documents used as context,
        A list of selected document IDs
    )
    """
    if not sorted_doc_infos:
        return "", []

    selected_indices = []
    total_doc_tokens = 0
    doc_dict = {}
    for i, doc_info in enumerate(sorted_doc_infos):
        doc = lambda_fn(doc_info) if lambda_fn else doc_info["text"]
        n_doc_tokens = len(tokenizer.encode(doc))
        if total_doc_tokens + n_doc_tokens < max_token_len:
            total_doc_tokens += n_doc_tokens
            selected_indices.append(i)
            doc_dict[i] = doc

    # The top ranked documents should go at the end.
    selected_indices = selected_indices[::-1]

    context = "".join([doc_dict[i] for i in selected_indices])
    selected_doc_infos = [sorted_doc_infos[i] for i in selected_indices]
    return context, selected_doc_infos


def answers(
    examples,
    question,
    model,
    examples_context,
    file_id=None,
    documents=None,
    logit_bias=None,
    max_rerank=200,
    max_tokens=16,
    alternative_question=None,
    search_model="ada",
    temperature=0.0,
    logprobs=2,
    stop=None,
    n=1
):
    """
    Given a prompt, a question, a list of (question, answer) pairs as examples, and
    a list of documents for context, it tries to include all the QA examples and top
    relevant context documents.
    The constructed prompt for the final completion call:
    ```
    Please answer the question according to the above context.
    ===
    Context: {{ the context for example QA pairs. }}
    ===
    Q: example 1 question
    A: example 1 answer
    ---
    Q: example 2 question
    A: example 2 answer
    ===
    Context: {{ a list of relevant documents sorted via search(question, documents) }}
    ===
    Q: question
    A:
    ```
    The returned object has a structure like:
    {
      "answers": [
        "Beijing",
        "Beijing, China"
      ],
      "completion_id": "xxx-xxx",
      "object": "answer",
      "selected_documents": [
        {
            "document": ...,    # document index, same as in search/ results.
            "object": "search_result",
            "text": ...,
        },
        ...
      ],
    }
    """

    examples = examples if examples else []

    example_prompts = [f"Q: {x}\nA: {y}" for x, y in examples]
    prompt = f"Q: {question}\nA:"

    # Append all the QA examples into the prompt.
    if examples_context:
        examples_context = CONTEXT_TEMPLATE.format(context=examples_context)
    instruction = (
        ANSWERS_INSTRUCTION + examples_context + "\n---\n".join(example_prompts) + "\n"
    )

    logit_bias = logit_bias if logit_bias is not None else {}

    if file_id is None and documents is None:
        raise Exception("Please submit at least one of `documents` or `file`.")
    if file_id is not None and documents is not None:
        raise Exception("Please submit only one of `documents` or `file`.")

    instruction = extract_instruction(instruction)

    n_instruction_tokens = len(tokenizer.encode(instruction))
    n_prompt_tokens = len(tokenizer.encode(prompt))
    n_query_tokens = len(tokenizer.encode(question))
    n_context_tokens = len(tokenizer.encode(CONTEXT_TEMPLATE.format(context="")))

    if documents is not None:
        documents = [doc.strip() + " " for doc in documents]
        n_docs_tokens = [len(tokenizer.encode(doc)) for doc in documents]

    # Except all the required content, how many tokens left for context stuffing.
    leftover_token_len = MAX_TOKENS_LIMIT - (
        n_instruction_tokens + n_context_tokens + n_prompt_tokens + max_tokens
    )
    sorted_doc_infos = []

    question_for_search = (
        alternative_question if alternative_question is not None else question
    )
    if file_id is not None:
        search_model_, sorted_doc_infos = semantic_search(
            search_model,
            question_for_search,
            file_id=file_id,
            max_documents=max_rerank,
        )

    elif len(documents) == 0:
        # If no context document is provided, do nothing.
        pass

    elif min(n_docs_tokens) >= leftover_token_len:
        # If there is no room for adding any context doc.
        pass

    elif (max_rerank is None or max_rerank >= len(documents)) and sum(
        n_docs_tokens
    ) < leftover_token_len:
        # If the total length of docs is short enough to be added all.
        selected_indices = list(range(len(documents)))

        sorted_doc_infos = [
            {"document": i, "text": documents[i]} for i in selected_indices
        ]

    elif n_query_tokens + max(n_docs_tokens) >= MAX_TOKENS_LIMIT:
        # If the prompt and the longest document together go above the limit.
        total_tokens = n_query_tokens + max(n_docs_tokens)
        raise Exception(
            f"The longest document and prompt pair together contains {total_tokens} "
            f"tokens, above the limit {MAX_TOKENS_LIMIT} for semantic search. Please consider "
            f"shortening the prompt or the longest document."
        )

    else:
        # If we can add some context documents but not all of them, we should
        # query search endpoint to rank docs by score.
        sorted_doc_infos = semantic_search(
            search_model,
            question_for_search,
            examples=[{"text": doc} for doc in documents],
            max_documents=max_rerank,
        )

    # Select documents w.r.t. the context length limitation.
    context, sorted_doc_infos = select_by_length(
        sorted_doc_infos,
        leftover_token_len,
        lambda_fn=lambda x: x["text"].strip() + " ",
    )

    # Add instruction before the context and the prompt after the context.
    if context:
        context = CONTEXT_TEMPLATE.format(context=context.strip())
    full_prompt = instruction + context + prompt

    completion_result = openai.Completion.create(
        engine=model,
        prompt=full_prompt,
        logit_bias=logit_bias,
        temperature=temperature,
        n=n,
        max_tokens=max_tokens,
        stop=stop,
        logprobs=logprobs,
    )
    
    top_prob_answers = completion_result["choices"][0]['logprobs']["top_logprobs"][0]
    completion_result["selected_documents"] = sorted_doc_infos
    result = dict(
        object="answer",
        selected_documents=completion_result.pop("selected_documents"),
        completion=completion_result["id"],
    )
    
    result["top_logprobs"] = dict(top_prob_answers)
    
    result["answers"] = [
        item["text"].replace("A:", "").split("Q:")[0].strip()
        for item in completion_result["choices"]
    ]

    return result


qna_examples = [
        ["Is sodium chloride also known as common salt?", "Yes"],
        ["Is Chinkara also known as Antelope cervicapra", "No"]
    ]

    
## Get OpenAI Batch Answers
def get_openai_batch_answers(df : pd.DataFrame) -> dict:
    """
    Call the `get_answers_openai` method to get batch responses
    """
    qna_dict = {"PrefName":[], "AlternateName" :[], "questions" : [], "nr_docs" : [] ,"GroundTruth":[] ,"answers" : [], "top_prob_ans" : []}
    for i in range(len(df)):
        pref_name = df.Pref_Name.iloc[i]
        synonym = df.Alternate_Name.iloc[i]
        nr_docs = df.nr_docs_final.iloc[i]
        ground_truth = df.GroundTruth.iloc[i]
        question = f"Is {pref_name} also known as {synonym}?"
        alt_question = f"Is {synonym} a synonym of {pref_name}?"
        qna_dict["PrefName"].append(pref_name)
        qna_dict["AlternateName"].append(synonym)
        qna_dict["nr_docs"].append(nr_docs)
        qna_dict["GroundTruth"].append(ground_truth)
        qna_dict["questions"].append(question)
        
        if nr_docs != 0:
            doc = df.documents.iloc[i]
        else:
            doc = []
            
        response = answers(examples = qna_examples, question = question, model = "davinci",
                           examples_context = "I am a bot that validates synonyms",
                           documents=doc, max_tokens=2,
                           alternative_question=alt_question,
                           search_model="curie",
                           temperature=0.1, logprobs=2, stop=["\n\n"], n=1)
        
        if "answers" in list(response.keys()):
            success_message = f"Response received for row index: {i} having pref_name:- {pref_name} and synonym:- {synonym}"
            # print(success_message)
            log.info(success_message)
        else:
            error_msg = f"ERROR in response for row index: {i} having pref_name:- {pref_name} and synonym:- {synonym}"
            print(error_msg)
            log.error(error_msg)
        
        qna_dict["answers"].append(response["answers"][0].replace('\n---',''))
        qna_dict["top_prob_ans"].append(response["top_logprobs"])
    return pd.DataFrame(qna_dict)

I set max_tokens = 2 in my API call as I just need either an Yes or No answer and included an example for both the Yes and No case in the above code block (see variable qna_examples ).
I have been playing with the documents which I provide for contextual relevance to guide (or) train the OpenAI completions model before it gives a Yes or No answer to validate synonyms and following are my observations

  • Unless one explicitly specifies A is also known not as B (see Calomel and Silicon Carbide case at the end of this Jupyter notebook in this Google Drive Folder, OpenAI.completions does not provide the correct answer.
  • Even the document evidence for the African Daisy case is not picked up by OpenAI where I state in documents and is not to be confused with the african daisy.
  • Also OpenAI itself seems to have been trained on inaccurate data (see the Black Rat and Oryctolagus cuniculus case where OpenAI ChatGPT is saying a black rat is also known as the European rabbit.

Is feeding relevant documents which explicitly state the fact that A is a synonym of B or A is not a synonym of B the only way to get correct predictions from openai.completions API endpoint?

Best Regards,
Dilip

Greetings,
Is it possible any openAI staff can update / modify the def semantic_search() function in the new answers transition guide, so that the deprecated openai.search API end point is not used?

If I see the transition guide for replacing search, again it is using openai.completions end point so I am not sure whether it is efficient to be using openai.completions API endpoint once for sorting and ranking relevant docs and another for getting answers?

What do I need to do (in the new answers transition guide) if I just want to pass a list of sentences (documents) to the openai.completions API endpoint without doing any search or ranking of documents?

Best Regards