Q&A pricing for 1 million Token data file

kunal0 · July 18, 2021, 7:14pm

I am curious to know the pricing for Q&A Api for the below scenario: -

a) Consider we have a data file of 1 million Token as the knowledge source.

b) Question asked is of 1000 Token

c) Answer responded is of 1000 Token.

How much will it cost to ask a single question & answer with the above information?

Actually, I am curious to establish that the 1 million Token data file scanned for answering the question is not part of the pricing. This is because for every single question if we are charged for the 1 million Token data file, then the cost per question will be prohibitive. Is there any way to avoid this charge?

hallacy · July 18, 2021, 9:55pm

Hi! Thanks for the question.

The short answer is that the pricing for the answers endpoint is tricky but probably not as expensive as you’re worried about.

The longer answer:
Uploading files to the endpoint currently does not cost anything. Your org has a max data storage limit of 1GB and can be increased if needed.

When you submit a question to the api, behind the scenes there are a couple of steps. The first step (and free) is that we do keyword matching to reduce the total possible documents down to max_rerank. The second step (and expensive) is that we use the /search endpoints to rerank those documents based on your input query.

After the search process, we then do a /completion using some of those reranked documents.

So, and this isn’t exact, but you could maybe approximate it with the following:
(max_rerank)*(cost of search based on your average document length, the query, and the search model) + (cost of a completion for the model which in your case would be question (1000 tokens), some tokens from selected documents, and the answer (1000 tokens))

Does that help?

asabet · July 20, 2021, 7:28am

Hi @hallacy, can docs be embedded at index-time instead of query time to reduce search costs? Also, how are doc tokens appended to the prompt? User control over this might help with optimizations and reasoning across documents perhaps?

hallacy · July 20, 2021, 5:34pm

Indexing embeddings: We’re actively working on something to fix this but at the moment there isn’t a way to embed docs to reduce costs.

Doc tokens: It’ll be easier for me to show you. Check out the return_prompt field and your api response should contain the full prompt we send to the /completions endpoint.

asabet · July 23, 2021, 4:00am

Great to hear you’re working on it! Doesn’t vector search typically involve pre-embedded docs and Maximum Inner Product Search (REALM uses it for example)?

Ah good point, I didn’t consider looking at the return_prompt.

obeydesign · July 23, 2021, 4:04am

Does all this apply similarly to fine tuning files?

Topic		Replies	Views
Understanding Search/QA Endpoints API	5	843	January 3, 2024
Calculating embeddings costs API	8	9821	September 5, 2023
File Search pricing (retreive the docs info) API pricing	4	1233	June 5, 2024
Are we repeatedly charged for all tokens in the context window? API	4	387	May 30, 2024
Do I misunderstand retrieval pricing? API gpt-4 , api	2	1158	January 3, 2024

Q&A pricing for 1 million Token data file

Related topics