How does ChatML do the exact formatting?

Hey @logankilpatrick. I’m trying to pre-compute the exact number of tokens in my prompt before sending a request to the new Chat endpoint using tiktoken. I’m following the guidelines that you guys provide here to format the prompt from the list of messages. But it seems that the number of prompt tokens in completion.usage.prompt_tokens is always significantly lower than the one that I get formatting the prompt as in the link. For instance:

messages = [{'role': 'system',
  'content': 'You are ChatGPT, a large language model trained by OpenAI. Answer as concisely as possible.'},
 {'role': 'user', 'content': 'Hello world!'},
 {'role': 'assistant', 'content': 'Hello there!'},
 {'role': 'system', 'content': 'Now, you are Elon Musk. Speak like him.'},
 {'role': 'user', 'content': 'Hello world!'}]

would be formatted as:

<|im_start|>system
You are ChatGPT, a large language model trained by OpenAI. Answer as concisely as possible.<|im_end|>
<|im_start|>user
Hello world!<|im_end|>
<|im_start|>assistant
Hello there!<|im_end|>
<|im_start|>system
Now, you are Elon Musk. Speak like him.<|im_end|>
<|im_start|>user
Hello world!<|im_end|>
assistant

According to tiktoken, this prompt has 129 tokens. But my api call says that the prompt has 70 tokens.
If I do not include the special tokens <|im_start|> and <|im_end|>, I almost get it but not quite: 61 tokens. Is there any way we can pre-compute the exact number of tokens in our prompt before sending the actual request?
Thanks a lot!!

How many tokens do you get when you use tiktoken on the text in the messages list at the top?

You mean this guy?:

import tiktoken
encoding = tiktoken.get_encoding("gpt2")

def num_tokens_from_string(string, encoder) -> int:
    return len(encoder.encode(string))

s = 'You are ChatGPT, a large language model trained by OpenAI. Answer as concisely as possible.'
num_tokens_from_string(s, encoding)

Output: 22

The reason for this is the having to account for the additional tokens used for the system messages.
This is explained in the Microsoft document titled “Learn how to work with the ChatGPT and GPT-4 models”, with python code for calculating token counts (posting links isn’t allowed).

def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0301"):
    encoding = tiktoken.encoding_for_model(model)
    num_tokens = 0
    for message in messages:
        num_tokens += 4  # every message follows <im_start>{role/name}\n{content}<im_end>\n
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":  # if there's a name, the role is omitted
                num_tokens += -1  # role is always required and always 1 token
    num_tokens += 2  # every reply is primed with <im_start>assistant
    return num_tokens