Official token count differs from OpenAI tokenizer

I’ve been trying to measure the number of tokens my request has and my measurement agrees with OpenAI API, but differs from what OpenAI API shows.

This is the prompt and response I got and the respective counts.

Joker is a clown that tells a funny joke when asked to. If a topic is specified in the prompt, the joke needs to be relevant.

Q: Tell a joke.
Joker: What’s the best thing about Switzerland? I don’t know, but the flag is a big plus.

Q: Tell a joke.
Joker: Three men enter a bar in the USSR. One says, ‘Why did Stalin only write in lowercase?’ The other one says, ‘Because he was afraid of capitalism.’

Q: Tell a joke about games.
Joker: Yesterday I saw a guy spill all his Scrabble letters on the road. I asked him, ‘What’s the word on the street?’

Q: Tell a joke.


What do you call a woman who is working on her computer? A gamer.

Note: There is a single space character before the response.

OpenAI Tokenizer | OpenAI Usage | My Measurement
Prompt: 167 167 167
Response: 16 24 16

Does anybody know how can I get the official number of tokens? For now I am using GPT2TokenizerFast.

What’s the usage that shows up which is different than the numbers you’ve calculated? The usage is tokens of prompt + completion

On this page Account Usage, for that request I see this:
which is different than the numbers I calculated (167 + 16).

You said your second response was 24 tokens. There’s a small delay on the billing page - it might be that?

I am 100% sure that the billing entry corresponds to that query, since it was the last one of the hour as you can see:

Sorry I messed up in my previous response by sending you a different billing entry (even though they have the same values).

Hi there, whitespace and new lines also add tokens (which you may have stripped?). I see that in your screenshot there are also values after the written completion ("| 183…").

1 Like

“| 183” is not part of the response and it has been added for debugging purposes.

I repeated the measurements again with 4 different responses and the same prompt as in the original post. The bold text is not part of the response and shows my calculated prompt + response tokens.

I don’t know, but I think it’s a good thing.(167 + 14)

This joke is so dirty that even my mom would blush.(167 + 12)

What did one monkey say to another after a close call?(167 + 12)

I went to the supermarket to buy my wife a present. I saw her favourite brand of beer, so I bought it for her. But she wouldn’t accept it, saying that we don’t drink beer anymore, and that there’s no point in buying something which will soon be out of date. I asked her why we stopped drinking beer, and she said that after a week of not drinking any alcohol at all she was going crazy with withdrawal symptoms.(167 + 91)

Note: There is a single space character before every response

This is my request breakdown for those 4 queries:

Unfortunately the two last have been merged into one single billing entry, but the completion values differ a bit from my calculations.

Hi there, would it be possible to share a playground link so I could try to replicate this behaviour? As now I’m just seeing pasted completions, it’d be great to see the behaviour first-hand.

1 Like

Here is the link: Playground

Hi there, I was not able to replicate the behaviour you’re discussing.

After pasting the response did you add a newline? Because I get 24 tokens without it.

What browser are you using? Sometimes copypaste behavior is inconsistent when dealing with trimmable whitespace characters.

1 Like

Google Chrome Version 90.0.4430.212 (Official Build) (64-bit) on Linux

First of all why we even need to calculate tokens using like GPT2 tokenizer if out dashboard knows how much token we used why GPT3 API just send same as in response body please @joey let me know if I am wrong here because that’s really important when we make our in productions

and @edoardo.luciani do you find some work around

1 Like

Following up on this - barring progress on the token count being included in the API, could you confirm that HF’s GPT2Tokenizer.from_pretrained(“gpt2”) is the one we should be using (versus something from GPT2TokenizerFast)?