How to calculate tokens from binded data of vector database

I need to know how to calculate the used tokens and the how to estimate the coast when using vector database when OpenAIEmbeddings, Also I need to know how the vectors counted as tokens

You can use tiktoken to tokenise and count the input text GitHub - openai/tiktoken: tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Then you multiply the cost per 1000 tokens by the number of tokens in your corpus divided by 1000

A rough estimate of the token count is 1/4 of the number of bytes in the dataset.

1 Like

Thank you for your reply, Iā€™m using nodejs could you pls tell me the best way to implement it using nodejs?

Sure, you an check out GitHub - ceifa/tiktoken-node: OpenAI's tiktoken but with node bindings

1 Like

Thank you,
I have a question pls: If I have list of 100 products with details for each one and embedded them then send the data to openai, Will the tokens count same as the tokens of products before embedding

You can embed your company documentation 1 time, then you can run a query on that embedding database, each time you run a new embedding database query you must first generate a vector to search against, the cost of this is very small $0.0001 per 1000 tokens as it uses the ada model. Once you have created the vector and used it to retrieve your context, you can then call the standard gpt3.5 or gpt-4 model with the embedding data as context and run your original query with that.

2 Likes