Prompt_tokens vs tiktoken.encoding_for_model().encode()

Hi there,

I see a mismatch in tokens counting.

import tiktoken

print( tiktoken.encoding_for_model('gpt-3.5-turbo').encode('salute me!') )

# [19776, 1088, 757, 0] <--- 4 tokens


import openai
openai.api_key = '<<REDACTED>>'

response = openai.ChatCompletion.create(
  model='gpt-3.5-turbo',
  messages=[
    {'role': 'system', 'content': 'salute me!'},
  ]
)

print(response)

#{
#  "id": "<<REDACTED>>",
#  "object": "chat.completion",
#  "created": 1691063916,
#  "model": "gpt-3.5-turbo-0613",
#  "choices": [
#    {
#      "index": 0,
#      "message": {
#        "role": "assistant",
#        "content": "Hello! How can I assist you today?"
#      },
#      "finish_reason": "stop"
#    }
#  ],
#  "usage": {
#    "prompt_tokens": 11,  <--- 11 tokens?
#    "completion_tokens": 9,
#    "total_tokens": 20
#  }
#}

Why is there a difference between tokenizer’s nr of tokens and prompt_tokens’s nr of tokens? 4 vs 11

How are tokens actually calculated?

Welcome to the forum!

The tiktoken call will give you the number of tokens for that string, the API call will have additional tokens for boundary markers and stop conditions.

For basic calls you’ll find this to be some fixed value, like 7 additional tokens for these extra parts.

Thank you for reply! Does python library have a way to calculate these values out of the box?

I’ve only ever added the 7 tokens to my counts, could be some functions in the cookbook, but I seem to remember a conversation about this some time ago where people were experimenting to produce a number.

That takeaway is that it’s accurate with a fixed offset, I think function calls add some extra that I’ve not experimented with with to calculate yet.

You can better see what’s happening here, https://tiktokenizer.vercel.app/

5 Likes