What is the reason for adding total 7 tokens?

A total of 7 tokens are added in the OpenAI API sample code below. Please tell me the reason and calculation logic.

OpenAI API Sample Code

tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n

Why 4 tokens?
“<|start|>{role/name}\n{content}<|end|>\n” is not 4 tokens.

num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>

Why 3 tokens?
”<|start|>assistant<|message|>” is not 3 tokens.

1 Like

What, exactly, are you trying to ask?


Is four tokens: 100264, 882, 100266, 100265.


Is three tokens: 100264, 78191, 100266.

1 Like

“<|im_start|>user<|im_sep|><|im_end|>” is 22 tokens.
<|im_start|> is 1 token?

You’re not using the right tokenizer.

Anything inside of (and including) <||> is a special token, used to inform ChatGPT about the parts of the messages.

I got it. Why every message follows 4 tokens and every reply is primed 3 tokens. I would like to know it.

  1. Because the user message needs to be identified as such. That takes 4 tokens, the start and end, the user role identifier, and the separator between the role and the message.
  2. The same as 1 without the end which gets generated in the response. That’s 3 tokens.
1 Like

Thank you.
Do you know website about special token?

You will note in the link, for current models (highlighting original gpt-3.5-turbo-0301 internal model selector endpoint is different):

if model in {
tokens_per_message = 3
tokens_per_name = 1

They don’t just include obfuscated comment for the code, there is no comment.

Overhead is different on these.

(edit: see later post for calculation with bare tokenizer input)

A set tokens_per_name, though, is unreliable; the colon is not always an additional token:


The final three overhead tokens are end injection of “assistant:” prompting.

So tokens_per_name should be taken into account when using the name optional parameter.
tokens_per_name is not always 1 token.
Do i understand correctly?

tokens_per_name = 1 is correct - unless you provide name inputs that aren’t (where a single token like “:x” demonstrated above, or even “:name” exists and would be utilized).

The overhead of one message = 7 billed tokens, the overhead of two = 11, three = 15.

This is a calculation scheme that is not broken by any inputs that seems reasonable:


1 Like

Hi. Where did you find out the token IDs for those special tokens? I can only see the IDs of some of the special tokens using tiktoken (like <|endoftext|>) but not the rest. Thanks!

What is the difference in how special tokens in newer models are added (why is tokens_per_message 3 instead of 4)?

gpt-3.5-turbo-0301: 9 prompt tokens
gpt-3.5-turbo-0613: 8 prompt tokens
gpt-3.5-turbo-1106: 8 prompt tokens
gpt-3.5-turbo-16k-0613: 8 prompt tokens
gpt-4-0314: 8 prompt tokens
gpt-4-0613: 8 prompt tokens
gpt-4-1106-preview: 8 prompt tokens
gpt-4-vision-preview: 8 prompt tokens

I see no difference. The overhead per message is 3 tokens plus the role token such as “assistant”. gpt-3.5-turbo-0301 used an older ChatML format.