Computing the cost of embedding requests

Hello everyone!

I’m developing a code to compare different retriever techniques using LangChain and OpenAI embedding (currently text-embedding-ada-002) to find the best retriever for my specific use case in terms of time, cost, and efficiency.

I’m comparing the following retriever modalities:

  • Base Retriever (with different chunk sizes and overlaps)
  • Contextual Compressor Retriever
  • Ensemble Retriever
  • Multi Query Retriever
  • Parent Document Retriever
  • Time Weighted Retriever

To compute the cost of vector store setup, I am using this technique:

model_cost = 0.10 / 1000000
total_tokens = 0

splits = text_splitter.split_documents(docs)
encoding = tiktoken.encoding_for_model("text-embedding-ada-002")

for chunk in splits:
    total_tokens += len(encoding.encode(chunk.page_content))

total_cost = total_tokens * model_cost

However, I am not sure how to compute the cost of each request to the retriever. For instance, the cost of this request:

snippets = retriever.invoke(query)

I tried using get_openai_callback, but it didn’t work.

Can anyone help me, please?

Once you have embedded your database, there is no additional API cost except for the single call to get an embedding value of your new search criteria, by tokens sent.

Everything else is performed algorithmically.

More advanced methods can perform data transformation on your search input by AI or otherwise send the query tokens in different manner, such as once in whole and then by sentence splits.

The implementation is up to you, and for you to understand.