How does ChatML do the exact formatting?

Hey @logankilpatrick. I’m trying to pre-compute the exact number of tokens in my prompt before sending a request to the new Chat endpoint using tiktoken. I’m following the guidelines that you guys provide here to format the prompt from the list of messages. But it seems that the number of prompt tokens in completion.usage.prompt_tokens is always significantly lower than the one that I get formatting the prompt as in the link. For instance:

messages = [{'role': 'system',
  'content': 'You are ChatGPT, a large language model trained by OpenAI. Answer as concisely as possible.'},
 {'role': 'user', 'content': 'Hello world!'},
 {'role': 'assistant', 'content': 'Hello there!'},
 {'role': 'system', 'content': 'Now, you are Elon Musk. Speak like him.'},
 {'role': 'user', 'content': 'Hello world!'}]

would be formatted as:

<|im_start|>system
You are ChatGPT, a large language model trained by OpenAI. Answer as concisely as possible.<|im_end|>
<|im_start|>user
Hello world!<|im_end|>
<|im_start|>assistant
Hello there!<|im_end|>
<|im_start|>system
Now, you are Elon Musk. Speak like him.<|im_end|>
<|im_start|>user
Hello world!<|im_end|>
assistant

According to tiktoken, this prompt has 129 tokens. But my api call says that the prompt has 70 tokens.
If I do not include the special tokens <|im_start|> and <|im_end|>, I almost get it but not quite: 61 tokens. Is there any way we can pre-compute the exact number of tokens in our prompt before sending the actual request?
Thanks a lot!!

How many tokens do you get when you use tiktoken on the text in the messages list at the top?

You mean this guy?:

import tiktoken
encoding = tiktoken.get_encoding("gpt2")

def num_tokens_from_string(string, encoder) -> int:
    return len(encoder.encode(string))

s = 'You are ChatGPT, a large language model trained by OpenAI. Answer as concisely as possible.'
num_tokens_from_string(s, encoding)

Output: 22