Hi,
Earlier I was using the open.answers.create
API end point to get binary Yes/No
answers to some questions that validate alternative names. Please see the code block below to reproduce my example case.
################
"""
A Set of wrapper functions built around the OpenAI API functions.
Source: https://beta.openai.com/docs/api-reference/introduction
Author: Dilip Rajkumar
"""
################
import os
import ast
import json
import openai
import logging
import pandas as pd
from typing import List, Union, Optional
## GLOBAL VARIABLES
# openai.api_key = os.getenv("OPENAI_API_KEY")
# default_user = "<INSERT USER ID>"
## SETUP LOGGER
logging.basicConfig(filename='../OpenAI_QnA_Query.log', filemode='a', format='%(asctime)s - %(funcName)s - %(levelname)s - %(message)s', level=logging.INFO)
log = logging.getLogger(__name__)
## ANSWERS
def get_answers_openai(qstn : str, docs : list = [], search_model : str = "ada",
model : str = "curie", temperature : float = 0.01,
examples : List = None, example_context : str = "", log_probs : int = None,
stop : Union[list, str] = ["\n", "<|endoftext|>"]) -> tuple:
"""
Answers the specified question using the provided documents and examples.
Parameters
==========
Refer: https://beta.openai.com/docs/api-reference/answers/create
Returns
=======
tuple : (list,dict) answers from the model response, and the complete model response.
"""
response = openai.Answer.create(search_model= search_model, model= model,
question = qstn, documents=docs,
logprobs = log_probs, examples_context=example_context,
examples=[examples],temperature = temperature,
max_tokens=5, stop=stop)
answers = response["answers"]
return answers, response
def get_openai_batch_answers(df : pd.DataFrame, qstn_middle_phrase : str = "also known as",
ex_context : str = "", ex_qstn : str = " ", ex_answer : str = " ") -> dict:
"""
Call the `get_answers_openai` method to get batch responses
"""
qna_dict = {"PrefName":[], "AlternateName" :[], "questions" : [], "GroundTruth":[] ,"answers" : [], "top_prob_ans" : []}
for i in range(len(df)):
pref_name = df.Pref_Name.iloc[i]
synonym = df.Alternate_Name.iloc[i]
nr_docs = df.nr_docs_final.iloc[i]
ground_truth = df.GroundTruth.iloc[i]
question = f"Is {pref_name} {qstn_middle_phrase} {synonym}?"
qna_dict["PrefName"].append(pref_name)
qna_dict["AlternateName"].append(synonym)
qna_dict["GroundTruth"].append(ground_truth)
qna_dict["questions"].append(question)
if nr_docs != 0:
doc = str(df.documents.iloc[i])
else:
doc = ''
answers, response = get_answers_openai(question, docs = [doc], model = 'davinci', examples = [ex_qstn, ex_answer], example_context = ex_context, log_probs = 2)
top_probs= dict(response['completion']["choices"][0]['logprobs']['top_logprobs'][0])
if "answers" in list(response.keys()):
success_message = f"Response received for row index: {i} having pref_name:- {pref_name} and synonym:- {synonym}"
# print(success_message)
log.info(success_message)
else:
error_msg = f"ERROR in response for row index: {i} having pref_name:- {pref_name} and synonym:- {synonym}"
print(error_msg)
log.error(error_msg)
qna_dict["answers"].append(answers[0])
qna_dict["top_prob_ans"].append(top_probs)
return pd.DataFrame(qna_dict)
### Example Questions and Context to `Get Answers` for Steering Responses
sample_context = "Common Salt (also known as Sodium Chloride) is a mineral abundantly found on the earth's surface."
sample_qstn = "Is sodium chloride also known as common salt?"
sample_answer = "Yes"
if __name__ == "__main__":
url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vTGuONBhr1yefCz619MugO9aT2ATxWcvsL41W4ZLYrIBCjdeHPXJthX8OMVrCnTpSHF6T4Cv_ujkP86/pub?gid=0&single=true&output=csv"
df = pd.read_csv(url)
df_response = get_openai_batch_answers(df,
ex_context = sample_context,
ex_qstn = sample_qstn,
ex_answer = sample_answer
)
df_response.to_csv("OpenAI_Sample_QnA_responses.csv", index = False)
print(df_response)
Input Dataframe:
In the input data frame file the documents
column contains a list of sentences which is used to “train” the openAI model.
I am now trying to transition to Option 2 in your answers transition guide 1 . Could anyone kindly clarify the following questions and provide some guidance in transitioning from this deprecated API to the completions or something similar which will give me the desired output (see last question in this post)?
1.) The sentences I feed in the documents column for each data point is already filtered and cleaned to some extent, so is openai.search.create
needed for my QnA requirement? If so, can you point to the correct transition guide for search replacement, because the current link in your transition guide seems to point to a wrong intercom app website (see below)
Do an OpenAI search (note that this is also being deprecated and has a
transition guide
where the documents are the user provided documents and the query is the query from above. Rank the documents by score.
I also have some questions regarding the def answers()
function in the option 2 - python script provided in your transition guide:
2.) What is this examples_context
argument ? Is it like some kind of title or label for what the QnA bot is supposed to do or does it has some important significance other than a label/title?
examples_context="I am a bot that names country capitals"
3.) Can I pass a list of sentences in this documents
argument as I currently do in my present implementation using the openai.answers
API endpoint? What can be the maximum nr. of sentences in this list, and how much word count per sentence or in total for this documents argument?
documents=["I am a bot that names country capitals"]
4.) What is this alternative_question
argument? What am I supposed to feed in here?
alternative_question="different test"
5.) The API documentation for completions mentions that a maximum total of 2048 tokens, can be fed into the API request (I am assuming via the documents argument). So what is this max_tokens=16
argument doing? it is like some kind of window length
for the OpenAI model to traverse through in chunks?
6.) If I just provide one example (positive YES
case) for the example context, example question and answer, is it biasing all the openAI responses to YES
### Example Questions and Context to `Get Answers` for Steering Responses
sample_context = "Common Salt (also known as Sodium Chloride) is a mineral abundantly found on the earth's surface."
sample_qstn = "Is sodium chloride also known as common salt?"
sample_answer = "Yes"
7.) After executing the above python script we get the following data frame output. How do I improve the OpenAI predicted answers to be more accurate to match the ground truth column?
PrefName AlternateName questions GroundTruth answers top_prob_ans
0 Fox Vulpes vulpes Is Fox also known as Vulpes vulpes? Yes Yes {' No': -1.4927609, ' Yes': -0.2767575}
1 Bleach Sodium Hypochlorite Is Bleach also known as Sodium Hypochlorite? Yes Yes {' No': -1.6149805, ' Yes': -0.2529657}
2 Blue Vitriol Copper Sulphate Is Blue Vitriol also known as Copper Sulphate? Yes Yes {' No': -1.2375259, ' Yes': -0.3838043}
3 Wild boar Sus scrofa Is Wild boar also known as Sus scrofa? Yes Yes {' No': -1.8056873, ' Yes': -0.20204972}
4 Bamboo Phyllostachys aurea Is Bamboo also known as Phyllostachys aurea? Yes Yes {' No': -0.8268861, ' Yes': -0.6317986}
5 Cheetah Catopuma temminckii Is Cheetah also known as Catopuma temminckii? No No {' No': -0.70515376, ' Yes': -0.73280275}
6 Black rat Oryctolagus cuniculus Is Black rat also known as Oryctolagus cuniculus? No Yes {' No': -1.6201084, ' Yes': -0.24748868}
7 Calomel Silicon Carbide Is Calomel also known as Silicon Carbide? No Yes {' No': -0.7262988, ' Yes': -0.72919035}
8 African daisy Sansevieria cylindrica Is African daisy also known as Sansevieria cyl... No Yes {' No': -1.2350233, ' Yes': -0.3873096}
9 Cream of Tartar Sodium Bicarbonate Is Cream of Tartar also known as Sodium Bicarb... No Yes {' No': -1.085476, ' Yes': -0.44409192}
I would also greatly appreciate any guidance/ suggestions to my code in making the transition from the deprecated answers endpoint to the new completions endpoint?
Best Regards,
Dilip