Understanding Search/QA Endpoints

Hi OpenAI team!

I would love to understanding Search/QA Endpoints
Suppose max_rerank=5, so 5 documents will be selected for a query.

  1. What are the role of those documents? They will be concatenated and provided as a prompt to our query ?

  2. The total token length of selected documents plus the lenght of query must less than 2048 ? How do we know the length of all 5 documents combined ?

  3. Do OpenAI recommend making each document equal token length ?

  4. How can pricing of Search/QA endpoints be calculated ?

Hey! Here are my responses, but anyone else feel free to chime in :smiley:

  1. If you set max_rerank to 5, then a maximum of 5 documents will be re-ranked and returned by search (instead of the default 200). The Search and Answers endpoints aren’t like Completions, in the way that everything fits into a “prompt.” Instead, documents are uploaded, indexed, and ranked according to their relevance to a query.

  2. You can pass up to 1 GB worth of documents - orders of magnitude more than 2,048 tokens. Each document can be up to 2,034 tokens, minus the number of tokens in the query. You can calculate the length of documents in the same way you’d calculate token length when using Completions, such as by using a tokenizer (like in this example).

  3. No, it’s not necessary for each document to have equal token length.

  4. You can find our endpoint pricing calculations here. For example, the cost of Search is:

(number of tokens in all of your documents) +
(number of documents + 1) * 14 +
(number of documents + 1) *
number of tokens in your query

Hope that helps, and please let me know if you have any other questions!

3 Likes

@joey Thanks so much for the detailed answer!!

I would love to understand the true process for QA endpoints that GPT3 uses to answer the question.

Do I understand the process correctly as following?

Suppose max_rerank = 5, and a query is given.

(1) 5 most relevant documents will be retrieved

(2.1) Each document and query will be concatenated to make completions
(2.2) So we perform 5 completions for this case – that’s why pricing above is multiplied by the factor of 5

(3) “Best of 5” completions will be returned as a result ?

(4) If all above is correct, then “each 1 document + query + completion” token length must <= 2048

Thanks again!

1 Like

Hi @joey, following up on Answers pricing.
I have a set of texts - you could think of them as “documents” as you API requires them.

I have one prompt. I want answers by applying the same prompt to every entry in my set of texts.

Can I achieve my goal by sending just one API? How would I do that (can you send example of syntax via curl or python?)

I’d like to use as few tokens as possible, and in this case, I’m thinking one API with one prompt would save tokens vs multiple API calls with the same prompt.

Pls let me know if unclear. Thanks for your help.

If you’re planning to send in the same prompt with the same documents each time, I would consider just using the /completions endpoint instead to save tokens.

In short, the answers endpoint will make calls to both the search and completions endpoints. Since you already know the ranking of your documents, you don’t need to redo the search step.

Instead, you can pass in return_prompt=True as an additional argument to the answers endpoint which will return the prompt we use for the completions endpoint. You can then just use that prompt and generate multiple times off of it instead.

3 Likes