Counting tokens for chat API calls (gpt-3.5-turbo)

Great resource at the OpenAI Cookbook at GitHub if you haven’t found it yet…

This one explains why the token count is a bit different with ChatGPT… and what you can do…

Counting tokens for chat API calls

ChatGPT models like gpt-3.5-turbo use tokens in the same way as other models, but because of their message-based formatting, it’s more difficult to count how many tokens will be used by a conversation.

Below is an example function for counting tokens for messages passed to gpt-3.5-turbo-0301.

The exact way that messages are converted into tokens may change from model to model. So when future model versions are released, the answers returned by this function may be only approximate. The ChatML documentation explains how messages are converted into tokens by the OpenAI API, and may be useful for writing your own function.

Learn more (including code) at the Source


Yeeeeees, I was so looking for this. Thanks a lot @PaulBellow.
A little bit discouraging to see that the exact token calculation depends on the model.

Another relevant aspect of this notebook is that it seems to clarify the role of the key “name” in the messages structure. Kind of. It says that, if there is a name, “role” is omitted from the tokenization… :thinking:


I’m having an interesting situation. I’ve looked through the above, and it doesn’t account for my experience.

Right now I have a prompt that I’m testing in the playground that is counted as 3,622 tokens (according to the bottom right box in the UI), and with max_tokens set to 200, completion in the playground works with text-davinci-003.

However, which I switch to Chat mode and use gpt-3.5-turbo (in fact, all I have to do is toggle the dropdown to Chat and it switches, leaving all settings and my prompt in place), when I submit the form, I get:

This model’s maximum context length is 4097 tokens. However, you requested 4244 tokens (4044 in the messages, 200 in the completion)

It’s just a single user message. And between the 2 modes, that’s a difference of 422 tokens. I’m also replicating this when I call the API endpoint in my code.

I can’t imagine how {"role": "user", "content":} is an additional 400+ tokens. I’m not sending anything in the system message.

Any ideas of what might be going on? It almost feels like a bug in the token counter.

1 Like

I figured it out. After looking around, it turns out that chat-gpt-3.5-turbo and gpt-4 use cl100k_base encoding.


They’ve optimized it for other languages! Feels cool, now Ukrainian consumes only twice as much tokens as English (instead of 6 times as it was before)