Hey Diet, Thanks for chiming in here and you raise some great points regarding token usage and comparing free tokens on ChatGPT to chargeable tokens via API, even though my focus was character count reduction, as this opens an interesting discussion/dilemma.
Firstly, I need to mention there were two small typos in my strings which resulted in inaccurate conclusions (see below). I think there are trade-offs between these depending on if you are paying via API, versus using free tokens on GPTs, and as you noted.
I think the unintended increase in token count in the provided strings seems to be the result of two things:
- there were two unintended typos in the strings (apologies for that) and more importantly
- the larger words such as “pizza” and “ingredients” create more tokens when concatenated or so it seems (and re-introducing some white space around them resolves this).
70 characters with spaces:
65 characters with no spaces but increase in 2 tokens:
And then adding back the space after pizza reduces the token count back to the baseline, while adding only 1 character due to the single space added back.
- I was able to replicate this while maintaining the initial token count but reducing the character count. For example, on a larger scale with 6010 characters from a prompt that totaled 1415 tokens, I was able to achieve a reduction to 1393 tokens and 5803 characters with no information loss, for a small savings of 3.4% in characters and 1.56% in prompt tokens.
Trade-offs between reduced character count at the expense of increased tokens:
In cases where the prompt is token-sensitive, such as via API, strategically removing spaces can reduce token count and characters but caution is needed around larger/complex words as token usage can increase depending on how aggressively this is done (despite saving on characters).
Conversely, in character-sensitive context such as a GPT configuration instruction, removing all whitespace seems to not affect performance, despite any negligible increase in token count that may occur (and if anything you get an extra 1-2k character to squeeze in to improve the GPT, which is the initial use-case that inspired me to create this approach).
These are my initial findings but there could be errors in some of the assumptions. I would be interested to hear more about similar findings anyone has. Thanks!