Hey! Here are my responses, but anyone else feel free to chime in
If you set max_rerank to 5, then a maximum of 5 documents will be re-ranked and returned by search (instead of the default 200). The Search and Answers endpoints aren’t like Completions, in the way that everything fits into a “prompt.” Instead, documents are uploaded, indexed, and ranked according to their relevance to a query.
You can pass up to 1 GB worth of documents - orders of magnitude more than 2,048 tokens. Each document can be up to 2,034 tokens, minus the number of tokens in the query. You can calculate the length of documents in the same way you’d calculate token length when using Completions, such as by using a tokenizer (like in this example).
No, it’s not necessary for each document to have equal token length.
You can find our endpoint pricing calculations here. For example, the cost of Search is:
(number of tokens in all of your documents) +
(number of documents + 1) * 14 +
(number of documents + 1) *
number of tokens in your query
Hope that helps, and please let me know if you have any other questions!
I would love to understand the true process for QA endpoints that GPT3 uses to answer the question.
Do I understand the process correctly as following?
Suppose max_rerank = 5, and a query is given.
(1) 5 most relevant documents will be retrieved
(2.1) Each document and query will be concatenated to make completions
(2.2) So we perform 5 completions for this case – that’s why pricing above is multiplied by the factor of 5
(3) “Best of 5” completions will be returned as a result ?
(4) If all above is correct, then “each 1 document + query + completion” token length must <= 2048
Hi @joey, following up on Answers pricing.
I have a set of texts - you could think of them as “documents” as you API requires them.
I have one prompt. I want answers by applying the same prompt to every entry in my set of texts.
Can I achieve my goal by sending just one API? How would I do that (can you send example of syntax via curl or python?)
I’d like to use as few tokens as possible, and in this case, I’m thinking one API with one prompt would save tokens vs multiple API calls with the same prompt.
If you’re planning to send in the same prompt with the same documents each time, I would consider just using the /completions endpoint instead to save tokens.
In short, the answers endpoint will make calls to both the search and completions endpoints. Since you already know the ranking of your documents, you don’t need to redo the search step.
Instead, you can pass in return_prompt=True as an additional argument to the answers endpoint which will return the prompt we use for the completions endpoint. You can then just use that prompt and generate multiple times off of it instead.