Weird, haven’t seen much on their search service. Why wouldn’t you just create embedding vectors from the documents and search that way? If you use their embedding engine (latest is text-embedding-ada-002) you would only pay once for what you embed. But you would pay for your own compute and database to retrieve the closest search results.
In the given formula for counting tokens, the variable “Number of tokens in your query” refers to the number of tokens in the search query that the user inputs. It does not refer to the number of tokens in the documents or statements in the dataset.
Therefore, in your example, if a user enters a query with, let’s say, 10 tokens, the total tokens would be calculated as follows:
The embedding is a different price to the completion. Take the number of tokens for embedding and divide by 1000 and then multiply by the embedding rate for ada002
That is the fixed cost of training
To make a query:
Take the tokens for the users query and divide by 1000 and then multiply by the embedding rate for ada 002 to get the cost to derive a vector you can use for searching (A)
Then add together the tokens for the snippets you find using semantic search (to use as a context) and the tokens for the actual question (including the extra text where you might say “referring to the following context, answer the question below.”)
Divide that total by 1000 and multiply by the davinci003 rate to get the cost of asking the question. (B)
When you get the answer or completion back, divide the tokens used by the completion part by 1000 and multiply by the Davinci rate to get the cost of your answer. (C)
Add A B and C. This is the cost of the query (give or take a couple of tokens)
The cost will depend on the question asked and the length of the contexts you use in the final query