Nope, there is no “boundary marker” or other theories.
The playground uses a bad tokenizer for the model. Consider:
"text-davinci-003": "p50k_base",
"text-davinci-002": "p50k_base",
"text-davinci-001": "r50k_base",
Now paste a whole bunch of varied text and switch playground between davinci-001 and -003. You will see that the token count doesn’t change.
Bad count using wrong BPE:
Correct count:

Change to text-davinci-001 in tiktokenizer and get the playground’s mistaken token count.
Worry not about that which you cannot control: other’s code. Record your own token usage by what is returned and compare to billing:
--response--
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"text": ""
}
],
"created": 1688568966,
"id": "cmpl-xxx",
"model": "text-babbage-001",
"object": "text_completion",
"usage": {
"prompt_tokens": 1293,
"total_tokens": 1293
}
}
---
--response--
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"text": "\n\nThis code provides a function that returns the encoding used by a given model name. It uses a dictionary to map model names to their corresponding encoding, and also checks for model names that match a known prefix. If the model name is not found, an error is raised."
}
],
"created": 1688568986,
"id": "cmpl-xxx",
"model": "text-davinci-003",
"object": "text_completion",
"usage": {
"completion_tokens": 56,
"prompt_tokens": 1107,
"total_tokens": 1163
}
}
---