Do the answers endpoint actually use the metadata?

amirham87 · April 24, 2021, 9:24pm

Quick question, is the metadata in files used for the answers endpoint used by GPT-3 in any capacity when trying to answer questions? Or is the metadata just for our own convenience?

hallacy · April 25, 2021, 1:16am

Just for your own convenience!

amirham87 · April 25, 2021, 12:49pm

Thanks @hallacy, that’s what I figured out after experimenting. What about the examples_context, the documentation does not really specify how it is used by GPT-3 or how I can use it to get better answers.

hallacy · April 25, 2021, 4:09pm

Short answer: It helps tell our models what kind of QA system it will be.

Longer answer: So most of what the /answers endpoint does under the hood is automatically try and create a nicely formatted prompt for you. The easiest way to see this is to add return_prompt=true in your API call. There should be a prompt field in the response that shows you the exact prompt we send to the /completions endpoint.

amirham87 · April 26, 2021, 4:16pm

Thanks again for the insight. However, it’s a new day which means a new question about the answers endpoint.

Just to make sure I keep the terminology correct, a “document” is what we call each line in the jsonlines file, right? What I wonder is if the answers endpoint can ever draw from multiple “documents” at a time when asking a question? Or if when it searches for the proper “documents” it will end up just using one?

In other words, if I have information that is correlated, should I ensure that they are in the same “document” so that it can give the best answer?

hallacy · April 26, 2021, 6:49pm

I’ll reply to these one at a time:

Just to make sure I keep the terminology correct, a “document” is what we call each line in the jsonlines file, right?

Correct!

What I wonder is if the answers endpoint can ever draw from multiple “documents” at a time when asking a question? Or if when it searches for the proper “documents” it will end up just using one?

/answers will filter down all documents in a file down to just max_rerank and then generate a score to indicate how relevant that document is. So if you set max_rerank=5, you should get the 5 most relevant documents in your file.
The tricky thing is that these might not all fit into your context so we might filter some of them out.

In other words, if I have information that is correlated, should I ensure that they are in the same “document” so that it can give the best answer?

Your results may vary, but thats not a bad idea to try. Our hope is that we’ll be able to move away from the keyword based filtering method we have now for something considerably more sophisticated soon.

amirham87 · April 30, 2021, 8:26pm

Hi again @hallacy, I’ve played around a bit more and have some new questions.

Returning the generated prompt really helped me figure out what is going on and I have identified what I think my “problem” is.

The search module works very well, even when using ada. The document it gives the highest score is consistently the correct one as long as I give it a proper high max_rerank. However, it still answers the questions incorrectly. And when I view the generated prompt I easily see why.

Even though it did score the correct document the highest, it still includes all the other found documents in the prompt. Which drowns out the correct document. I need a high max_rerank for it to find the correct document, but how can I prevent it from using all the other low scoring documents in the prompt?

It also makes the prompt very token expensive, when I have max_rerank = 200 and it decides to throw all (maybe not actually all, but it feels like it) those documents into the prompt instead of just using the highest scoring document I am almost at the maximum token cost.

To summarize, I need a high max_rerank for the search, but need a way to exclude the unwanted documents from the answers prompt.

joey · May 3, 2021, 10:33am

Hi Amir, at this time, there’s no fine-grained control on which documents are used in the prompt. A different option would be to separately use the Search and Completions endpoints.

amirham87 · May 3, 2021, 11:41am

Ah yes that’s what I feared. And I’ve actually done what you suggested and created essentially my own version of Answers, but I was hoping there was an option I had missed somewhere. Thanks.

Topic		Replies	Views
Understanding Search/QA Endpoints API	5	838	January 3, 2024
Looking for help with example context and the answer results that I receive Prompting	3	1167	August 30, 2021
Question about JSONL file for search endpoint API	5	837	July 22, 2023
How to provide "context" in a Q&A chatbot Prompting	12	11522	December 20, 2023
Use file with text-davinci-001 to increase tokens in prompt Prompting	13	2544	December 15, 2023

Do the answers endpoint actually use the metadata?

Related topics