Ranking / Scoring documents in Question Answering

Andreas · February 27, 2022, 9:47pm

I am wondering how to rank the documents when using Question Answering, so I know which document(s) the answers are based on. It is not clear how to do it in the api-reference for answers:

Any ideas?

sps · February 28, 2022, 10:59am

Hi @Andreas

The response of a request to the answers endpoint returns a selected_documents attribute. See OpenAI API reference for answers on the right hand side is the sample response.

You may also want to explore return_metadata.

Andreas · February 28, 2022, 11:12am

Hi @sps

It does indeed return selected_documents but when I try it out (with 5 sample documents) they all return, in no particular order without a score, and it is unclear which ones are the relevant ones.

sps · February 28, 2022, 12:18pm

That is very intriguing. Given how the answers api first uses a search model and then another engine for completion, it does seem weird that the score isn’t being returned with selected documents.

UPDATE: I tested it on my end and scores are being returned

It seems that the specific response snippet in docs is old.

Andreas · February 28, 2022, 8:52pm

Oh, interesting! Did you use uploaded documents? So far I have just used “hard coded” documents like in the example:

documents=["Puppy A is happy.", "Puppy B is sad."],

and I get no score…

sps · March 1, 2022, 5:15pm

Yes @Andreas, I’m using file instead of documents but it should return scores regardless of whether I use file (to upload large number of documents) or simply use an array of lines using the documents. This keeps getting interesting.

Can you also share the search_model and engine you’re using?

UPDATE: Yes you’re correct scores are being returned when file is used as opposed to documents.

I wonder why would that be the case @staff

lmccallum · March 1, 2022, 8:20pm

I believe the answers endpoint first uses a keyword search to narrow the documents to the top 200 and then re-ranks those, giving a score for each. This assumes you uploaded a file with >200 lines. If you have less than 200 documents, I think perhaps the search/re-rank steps are skpped, hence no score is generated.

sps · March 2, 2022, 5:37pm

That would mean that upto 200 documents can be completely consumed/processed by the endpoint for generating answer but when that limit is exceeded (using the only possible way i.e file), it has to get the top 200 (max) semantically similar docs for answer generation. Very interesting.

Thanks @lmccallum

lmccallum · March 2, 2022, 6:59pm

I believe that is correct. And regardless of whether or not a file is used, a maximum of 2048 tokens per line is permitted.

Topic		Replies	Views
Answer endpoint and score return API	2	761	January 5, 2022
Assistant file_search highest ranked chunk not used in answer API api , assistants-api , assistants-files	2	161	September 11, 2024
Do the answers endpoint actually use the metadata? API	10	1975	July 23, 2023
Answers endpoint- understanding document "ID" API	2	428	June 18, 2021
Understanding Search/QA Endpoints API	5	885	January 3, 2024

Ranking / Scoring documents in Question Answering

Related topics