In this first Image you’ll see a screenshot assuring that the paragraph is 35 tokens.
Putting this through the tokenizer here…
You can see through the OpenAI Tokenizer, it states its 10 tokens more than the previous 35, at 45 tokens. That is a massive 28.5% percent difference.
Not trying to accuse anyone - but as someone trying to start a small business with the use of the OpenAI API - It’s becoming increasingly apparent that there is a real lack of transparency on tokenization, and it becomes incredibly difficult (not impossible) for the small business owner to calculate costs. I really hope in the future OpenAI makes it a priority to be transparent with developers so they can understand THEIR OWN pricing models so they can set up THEIR OWN products and profit off of them. The lack of transparency on tokenization is a problem.
Regardless, because I know I’m just going to get a bunch of hate - its still not right that one example states it is exactly 35 tokens and on their own tokenizer it says 45.
According to the OpenAI Tokenizer, I will actually be charged for 28.5% percent more tokens than what the website itself states. That’s not transparent and that’s a massive cost differential to understand as someone running a business