Hello everyone!
I’m developing a code to compare different retriever techniques using LangChain and OpenAI embedding (currently text-embedding-ada-002) to find the best retriever for my specific use case in terms of time, cost, and efficiency.
I’m comparing the following retriever modalities:
- Base Retriever (with different chunk sizes and overlaps)
- Contextual Compressor Retriever
- Ensemble Retriever
- Multi Query Retriever
- Parent Document Retriever
- Time Weighted Retriever
To compute the cost of vector store setup, I am using this technique:
model_cost = 0.10 / 1000000
total_tokens = 0
splits = text_splitter.split_documents(docs)
encoding = tiktoken.encoding_for_model("text-embedding-ada-002")
for chunk in splits:
total_tokens += len(encoding.encode(chunk.page_content))
total_cost = total_tokens * model_cost
However, I am not sure how to compute the cost of each request to the retriever. For instance, the cost of this request:
snippets = retriever.invoke(query)
I tried using get_openai_callback
, but it didn’t work.
Can anyone help me, please?