Perhaps you have some calculator that you use personally, and can suggest?
I want to see what’s the estimated price for system prompt (to read) and then when it gives the output (for the user).
What would be estimate pricing.
Perhaps you have some calculator that you use personally, and can suggest?
I want to see what’s the estimated price for system prompt (to read) and then when it gives the output (for the user).
What would be estimate pricing.
You can use tiktoken
to encode the text and count the number of tokens and then calculate the cost using the pricing from Pricing | OpenAI
Example:
import tiktoken
class CostCalc:
model_pricing = {
"gpt-4o-mini": (0.15, 0.6),
"gpt-4o": (5.0, 15.0),
}
def __init__(self, model: str) -> None:
self.model = model
self.encoding = tiktoken.encoding_for_model(model)
def count_tokens(self, content: str):
return len(self.encoding.encode(content))
def calculate_input_cost(self, content: str):
return (
self.model_pricing[self.model][0] / 10**6 * self.count_tokens(content)
)
def calculate_output_cost(self, content: str):
return (
self.model_pricing[self.model][1] / 10**6 * self.count_tokens(content)
)
calc = CostCalc("gpt-4o")
input_content = "Can you please help me with this?" * 1234
output_content = "Sure, I can help you with that." * 1234
print(calc.calculate_input_cost(input_content))
print(calc.calculate_output_cost(output_content))
It looks like the creator took the pricing calculator out of this easy-to-use token-counting site.
https://tiktokenizer.vercel.app/?model=gpt-4o
You have to do your own divide the 1k price for the model by a thousand and then multiply by tokens.
With assistants and its internal tools, or uploading documents, you have little ability to predict what will be charged per call or see how many tokens from documents were extracted, but an AI search with enough documents will add 10k+ input to a thread, and then the thread will be continuing to load that when you continue to interact with it.
Here’s a script I previously posted, a function that can accept whole message lists, but is not updated to anything more than text the standard way, and you need to use “o200k_base” as the encoder for “4o” models.