Official tokenizer has huge count difference from OpenAI tokenizer

sanandataleyk · October 1, 2023, 5:30pm

The official tokenizer returns completely different number of tokens than API. I noticed that when I get error. Since the difference is huge I write the approximate number. I tried to do analyze for content of this article azpolitika.info/?p=730071

Official tokenizer value: ~12k
Due to API: ~19k

7K is huge gap between API and Official tokenizer. It makes your service not transparent. There is no way to evaluate the price before running.

_j · October 1, 2023, 5:33pm

Use this one and select the model: https://tiktokenizer.vercel.app/

The OpenAI site needs to be taken down if they won’t update it for new models…

anon22939549 · October 1, 2023, 5:35pm

You’re using the wrong tokenizer.

The OpenAI GUI tokenizer is for GPT-3 and uses r50k_base, the newer models use cl100k_base.

sanandataleyk · October 1, 2023, 5:35pm

thanks for quick reply. this service which you provided shows even less 9k token for same content.

sanandataleyk · October 1, 2023, 5:38pm

thanks for quick reply. I have counted manually with cl100k_base and also returns ~9k which is even less than offical tokenizer. None of the tokenizer returns ~19k. This is approximately 2factor more cost from openai side. And it would be nice to have someone from OpenAI to clarify this.

_j · October 1, 2023, 5:44pm

None of the AI models that are public on the API accept 19k tokens either. Where did you get that figure? If in the usage log, there may be multiple requests that are combined into one report per 5 minutes.

The ultimate billed token count can be seen in the API response when you don’t use streaming.

The tokenizer site preloads a template for you as if you were sending an input. Clear out all the data until the token count reads 0, then paste a response if you want to measure the text sent back to you.

The price shown is for input, which is 75% the price of output on gpt-3.5-turbo.

What I just started a GPT-4 conversation with (ironically, about tokens):

sanandataleyk · October 1, 2023, 5:47pm

Exactly because of that reason I caught that. I notice in my logs error about the reason which you mentioned. Then it was interesting to me and I checked official tokenizer and different sources but non of them returned ~19k result.

Error message from API This model's maximum context length is 16385 tokens. However, your messages resulted in 18870 tokens

_j · October 1, 2023, 5:50pm

You need the rest of the message, where it says how much you sent PLUS the amount that you reserved for an output as max_tokens.

openai.error.InvalidRequestError: This model’s maximum context length is 4097 tokens. However, you requested 12625 tokens (280 in the messages, 12345 in the completion). Please reduce the length of the messages or completion.

That’s me getting rejected for trying to reserve too much of the context length for a response.

And here’s the actual API response from -16k, so I think some software you’re using is re-writing the error:

openai.error.InvalidRequestError: This model’s maximum context length is 16385 tokens. However, you requested 23736 tokens (280 in the messages, 23456 in the completion). Please reduce the length of the messages or completion.

sanandataleyk · October 1, 2023, 5:52pm

I have big texts that’s why I just reserve 256 tokens. So that’s not the case. There’re ~7k difference.

_j · October 1, 2023, 5:55pm

Are you using software not developed by you with a built-in token counting method that rejects you with a miscount before you even send to the API?

The tokenizer site where you can select the gpt-3.5-turbo model is correct.

Are you only considering the data? Also counted: system prompt message. Past conversation. Function specifications.

sanandataleyk · October 1, 2023, 6:04pm

No software built by me. I am getting this error from OpenAI. I just using github.com/tiktoken-go/tokenizer library to count which returns ~9k which is probably same as the service which you shared.

System prompt is just 31 token. I am not using past conversation and creating creating new chat completion each time.

_j · October 1, 2023, 6:12pm

Maybe there is some kind of re-encoding going on. For example, use of unicode that is written to different byte sequences.

The API doesn’t count wrong. That same screenshot user message sent to -16k by Python API code:

response = openai.ChatCompletion.create(
    messages = [{"role": "system", "content": "You are a helpful assistant"},
                {"role": "user", "content": user}],
    model = model,
    top_p = 0.0, stream = True, max_tokens = 23456)

openai.error.InvalidRequestError: This model’s maximum context length is 16385 tokens. However, you requested 30387 tokens (6931 in the messages, 23456 in the completion). Please reduce the length of the messages or completion.

sanandataleyk · October 1, 2023, 6:18pm

Thanks a lot for your answers. That could be an issue. I will debug and try to find maybe there is some other encoding which I am missing.

Topic		Replies	Views
Chat Token counts inconsistency between playground platform and tiktokenizer API chatgpt , token	2	652	December 27, 2024
Struggling to get correct token count Community gpt-4 , gpt-35-turbo , api	2	1867	September 4, 2023
Official token count differs from OpenAI tokenizer API	15	1871	January 3, 2024
Counting tokens for chat API calls (gpt-3.5-turbo) Documentation	5	27161	December 13, 2023
Token Count: Playground vs Tokenizer GPT builders token , pricing , assistants , assistants-api , assistants-pricing	10	1546	February 3, 2024

Official tokenizer has huge count difference from OpenAI tokenizer

Related topics