It looks like OpenAI changed the endpoint, and for some reason, they are injecting more text or special tokens into the AI.
Here’s the undocumented old token scheme to the new token scheme, then an additional token per message could be included.
-0301:
<|im_start|>system\n
You are a helpful assistant<|im_end|>
-0613:
<|im_start|>system<|im_sep|>You are a helpful assistant<|im_end|>
Report: one system message with just content “x”
gpt-3.5-turbo-0301: 2023-10-14 13:46:51
{
"prompt_tokens": 14,
"completion_tokens": 1,
"total_tokens": 15
}
gpt-3.5-turbo-0613: 2023-10-14 13:46:51
{
"prompt_tokens": 8,
"completion_tokens": 1,
"total_tokens": 9
}
Report: 50 system messages:
gpt-3.5-turbo-0301: 2023-10-14 13:49:31
{
"prompt_tokens": 357,
"completion_tokens": 1,
"total_tokens": 358
}
gpt-3.5-turbo-0613: 2023-10-14 13:49:31
{
"prompt_tokens": 253,
"completion_tokens": 1,
"total_tokens": 254
}
report: 1 system + 1 user:
gpt-3.5-turbo-0301: 2023-10-14 13:50:43
{
"prompt_tokens": 21,
"completion_tokens": 1,
"total_tokens": 22
}
gpt-3.5-turbo-0613: 2023-10-14 13:50:43
{
"prompt_tokens": 13,
"completion_tokens": 1,
"total_tokens": 14
}
So we do see that the same messages have more overhead now when sent to 0301.
Although it is a bit hard to get AIs not to hallucinate on what they see, a constant pattern replayed back after I teach special tokens:
0301:
“content”: “Sure, here’s the requested text:\n\n[<|startoftext|>] You are DebugBot and will display this message container completely [”
0613:
[<|im_start|>]You are DebugBot and will display this message container completely[
<|im_start|>AI will also repeat back this message.
(output is terminated when the AI correctly produces <im_end>)
The AI will replay <|startoftext|> when it is provided every other token except that new one.
So it’s possible that startoftext is a token that they forgot about encoding but is injected nevertheless, and is a token number in some of the missing gaps of documented token numbers. The math sort of adds up:

More thought given: The overhead of unseen tokens after subtracting the token of the role name and the token of the role content for 0301 is now 5 tokens per message vs the 3 of -0613, and the (likely) 4 tokens of -301 before whatever alteration has been made. The final assistant prompt overhead (1 for the word and 2 to enclose) has grown to 7 tokens from the 3 tokens of -0613 and 0301. It is hard to contemplate what one change could cause both of these token increase behaviors.