Q&A pricing for 1 million Token data file

I am curious to know the pricing for Q&A Api for the below scenario: -

a) Consider we have a data file of 1 million Token as the knowledge source.

b) Question asked is of 1000 Token

c) Answer responded is of 1000 Token.

How much will it cost to ask a single question & answer with the above information?

Actually, I am curious to establish that the 1 million Token data file scanned for answering the question is not part of the pricing. This is because for every single question if we are charged for the 1 million Token data file, then the cost per question will be prohibitive. Is there any way to avoid this charge?

Hi! Thanks for the question.

The short answer is that the pricing for the answers endpoint is tricky but probably not as expensive as you’re worried about.

The longer answer:
Uploading files to the endpoint currently does not cost anything. Your org has a max data storage limit of 1GB and can be increased if needed.

When you submit a question to the api, behind the scenes there are a couple of steps. The first step (and free) is that we do keyword matching to reduce the total possible documents down to max_rerank. The second step (and expensive) is that we use the /search endpoints to rerank those documents based on your input query.

After the search process, we then do a /completion using some of those reranked documents.

So, and this isn’t exact, but you could maybe approximate it with the following:
(max_rerank)*(cost of search based on your average document length, the query, and the search model) + (cost of a completion for the model which in your case would be question (1000 tokens), some tokens from selected documents, and the answer (1000 tokens))

Does that help?

4 Likes

Hi @hallacy, can docs be embedded at index-time instead of query time to reduce search costs? Also, how are doc tokens appended to the prompt? User control over this might help with optimizations and reasoning across documents perhaps?

Indexing embeddings: We’re actively working on something to fix this but at the moment there isn’t a way to embed docs to reduce costs.

Doc tokens: It’ll be easier for me to show you. Check out the return_prompt field and your api response should contain the full prompt we send to the /completions endpoint.

2 Likes

Great to hear you’re working on it! Doesn’t vector search typically involve pre-embedded docs and Maximum Inner Product Search (REALM uses it for example)?

Ah good point, I didn’t consider looking at the return_prompt.

Does all this apply similarly to fine tuning files?