I’m looking at the answers endpoint to try to return a “factual” statement from GPT-3. Before I start uploading files, though, I thought I’d try to generate a simple example. For some reason, ada and curie and davinci get it “wrong”, while babbage nails it.
Example python request:
answer = openai.Answer.create(
search_model="ada",
model="ada",
documents=["Brian wears shorts on Thursdays.", "Brian wears pants on Fridays.", "Brian wears a kilt Monday, Tuesday and Wednesday.", "Brian like dishwashers."],
question="Does Brian wear kilts on Fridays?",
examples_context="In 2017, U.S. life expectancy was 78.6 years.",
examples=[["What is human life expectancy in the United States?", "Seventy-eight long and luxurious years."]],
max_rerank=10,
max_tokens=30,
temperature=0.0,
stop=["END", "\n", "."],
return_prompt=True
)
Answers yes.
Originally, I had “Brian wears a kilt everyday, except Thursdays and Fridays”, but even the above rewording is still not operational. Except model=babbage, which always works, irrespective of the search_model value.
finally, asking
question="It is Friday, what is Brian wearing?",
Tells me “kilt”.
Is there something off with these simple documents and sentences?
I found that a max_rerank of 1 does return the correct answer with ada and curie, though. Are we required to look at the score (if one exists) to figure out a “correct” answer?
Interesting example, thanks for sharing. I believe that you’ll need several more trials to identify a model or a pair of models that will serve your dataset well. Useful to remember, maybe, that the classifications end-point is “just” a few shots completion, after a search.
Depending on the real data that you have, a max_rerank of 10 can become rather expensive.
Worth trying the former “instruct” models on such a task, for the completions part.
Very important to be sure of what works before having millions of users, hence key to properly randomise your validation and test sets.
Quick note: I don’t think I’m using a classifications system in this process. If so, it’s hidden to me…
As I continue to test, I’m finding something else odd - when openai.Engine(search_model).search() operates, it seems to break documents by \n. So, in my sample dataset, I have a sentence that has \n in it before a period. Setting the stop_sequence to something else doesn’t seem to override this. Is it expected behavior to assume that a \n is the “end of a thing”?
Learned something new: the FILE purpose of “answers” breaks things differently that “search”. Search keeps the newlines in the result, whereas answers seems to do it’s cutting on the newline. So, for fine-tuned filtering, it’s a search that is needed, not an answers upload.
EDIT: Possible red herring; the stop_sequence was hardcoded in my script on ["\n", “.”]