In my project I am using the JSON mode for GPT3.5-turbo-1106. I have noticed that my system prompt token count is roughly 283. But the usage object on the response claims that it’s 500 tokens.
I’ve been using this model A LOT lately but never though of checking if there is any discrepancy between expected number of tokens and what we get response from OpenAI. Update my library for prompt testing to collect that as well and did a quick test (using json mode of 3.5-1106):
unexp_diff_prompt_tokens unexp_diff_completion_tokens
count 88.000000 88.0
mean 15.181818 0.0
std 0.578254 0.0
min 15.000000 0.0
25% 15.000000 0.0
50% 15.000000 0.0
75% 15.000000 0.0
max 17.000000 0.0
Prompts were ~700 tokens each. As you can see in 88 requests I made the difference was 15 tokens (same was for much shorter prompts, so it seems to me it is a fixed number).
I’m using tiktoken’s encoding_for_model to calculate expected tokens. I wonder if the discrepancy may come from here? Does tiktoken know all of the variations of the models and does the calculations same was as OpenAI on the backend?
As for your case, for community to provide help you’ll need to provide your code and prompts, guessing is not fun
Tiktoken does not know model overhead, which is actually the chat endpoint token overhead of enclosing messages in containers and the real unseen “prompt”.
It can be calculated as 7 tokens for first message content, 4 for additional, and if using a name parameter (such as “jacob”), you also should make a separate count for “:jacob” (including the colon) per message.
Function specifications also use some tokens.
JSON mode is a GPT-4 that has been trained a bit differently.