Using the API the token count is off

I am not sure why I am getting a higher token count. Maybe I am misunderstanding the token count. See the image below. I made a Postman request with just saying “hi” and the response coming back is “Hello! how can I assist you today?” The response coming back is correct 9 tokens but just saying hi is 8 tokens? I expected the prompt token to be 1 in this instance.

Is it behaviour stable and reproduceable or a single occurrence?

It’s very simple, ask chatGPT :slight_smile:
There are some “additional tokens involved for processing the message”.

I need to count tokens. So I need to know your internal representation of prompt when I send “hi” to gpt-3.5-turbo in plain text.

When you send a message like “hi” to GPT-3.5-turbo, the internal representation of the prompt in terms of tokenization is quite straightforward. The word “hi” itself constitutes one token. However, there are additional tokens involved for processing the message, such as special tokens used to denote the start and end of a prompt, or tokens representing spaces or punctuation if present.

For the message “hi”, the token breakdown would typically be:

  1. A special token to indicate the start of the input (like <|startoftext|> in some models).
  2. The token for “hi”.
  3. Potentially a token for the end of the input or a separator (like <|endoftext|>).

In total, this would typically amount to 2-3 tokens, depending on the exact formatting and the model’s tokenization rules. It’s important to note that the actual token count can vary slightly depending on the specific implementation and version of the model being used.

1 Like

It’s stable. I replicated this behaviour in my Postman.
Additional tokens are needed for communication.

1 Like

The containerization of messages within special tokens causes token overhead. The part where “user” is prefixed to what the user wrote, and “assistant” being inserted where the AI should write also.

You can send “hi” to a completion model instead and only be billed for what you see. What you get back might not be so concise in token usage though…

Here You have a very good explanation of this special tokens, which creates envelope for the message:
and here part of the tokenizer code (chapter: 6. Counting tokens for chat completions API calls):

elif model == "gpt-3.5-turbo-0301":
    tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n

It should explain why there is not 1 for “hi”.

1 Like

Thank you, everyone, for responding. Having the user’s text with added system decoration makes sense and OpenAI seems to have it documented in the Python notebook increases the token count. I expect the same for the return tokens. This doesn’t seem to be the case.

BTW: I did add the additional decorations and I couldn’t make it 8 tokens :frowning:

I went down this rabbit hole a while back and wrote up a blog post about it, and published a JavaScript library to help estimate the token counts.

Yeah, it would be great if OpenAI were to comment so the mystery of tokens can be resolved. Keeping customers in limbo but charging them is not what I would expect.

The current version of ChatML and tokens that are trained in different circumstances are not published. However, when you account for the correct tokens (which cannot be joined) and the inserted unseen text, your count will come out right.

<|fim_prefix|>system<|fim_middle|>You are not ChatGPT<|fim_suffix|><|fim_prefix|>user:name_fieid<|fim_middle|>Bake a cake<|fim_suffix|><|fim_prefix|>assistant<|fim_middle|>

Overhead: 7 for first message, 4 per additional


Cool! I am curious about what does the <|endofprompt|> do?