The official tokenizer returns completely different number of tokens than API. I noticed that when I get error. Since the difference is huge I write the approximate number. I tried to do analyze for content of this article azpolitika.info/?p=730071
Official tokenizer value: ~12k
Due to API: ~19k
7K is huge gap between API and Official tokenizer. It makes your service not transparent. There is no way to evaluate the price before running.
thanks for quick reply. I have counted manually with cl100k_base and also returns ~9k which is even less than offical tokenizer. None of the tokenizer returns ~19k. This is approximately 2factor more cost from openai side. And it would be nice to have someone from OpenAI to clarify this.
None of the AI models that are public on the API accept 19k tokens either. Where did you get that figure? If in the usage log, there may be multiple requests that are combined into one report per 5 minutes.
The ultimate billed token count can be seen in the API response when you don’t use streaming.
The tokenizer site preloads a template for you as if you were sending an input. Clear out all the data until the token count reads 0, then paste a response if you want to measure the text sent back to you.
The price shown is for input, which is 75% the price of output on gpt-3.5-turbo.
What I just started a GPT-4 conversation with (ironically, about tokens):
Exactly because of that reason I caught that. I notice in my logs error about the reason which you mentioned. Then it was interesting to me and I checked official tokenizer and different sources but non of them returned ~19k result.
Error message from API This model's maximum context length is 16385 tokens. However, your messages resulted in 18870 tokens
You need the rest of the message, where it says how much you sent PLUS the amount that you reserved for an output as max_tokens.
openai.error.InvalidRequestError: This model’s maximum context length is 4097 tokens. However, you requested 12625 tokens (280 in the messages, 12345 in the completion). Please reduce the length of the messages or completion.
That’s me getting rejected for trying to reserve too much of the context length for a response.
And here’s the actual API response from -16k, so I think some software you’re using is re-writing the error:
openai.error.InvalidRequestError: This model’s maximum context length is 16385 tokens. However, you requested 23736 tokens (280 in the messages, 23456 in the completion). Please reduce the length of the messages or completion.
No software built by me. I am getting this error from OpenAI. I just using github.com/tiktoken-go/tokenizer library to count which returns ~9k which is probably same as the service which you shared.
System prompt is just 31 token. I am not using past conversation and creating creating new chat completion each time.
Maybe there is some kind of re-encoding going on. For example, use of unicode that is written to different byte sequences.
The API doesn’t count wrong. That same screenshot user message sent to -16k by Python API code:
response = openai.ChatCompletion.create(
messages = [{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": user}],
model = model,
top_p = 0.0, stream = True, max_tokens = 23456)
openai.error.InvalidRequestError: This model’s maximum context length is 16385 tokens. However, you requested 30387 tokens (6931 in the messages, 23456 in the completion). Please reduce the length of the messages or completion.