Explosion in the number of tokens / words generated

Foxalabs · August 7, 2023, 2:59pm

The rule of thumb for the conversion between tokens used and words represented is 0.75.

If we apply this rule to your document then we have 227 tokens used, which would on average English text encode 170.25 words, with your example being 159 words, a difference of 6.6% more, a value below the margin of error as 0.75 words per token is very approximate.

I’m not sure where your value of 74% comes from, could you explain your calculation method?

Topic		Replies	Views
All languages are NOT created (tokenized) equal Community token , app , comparison , statistics	9	4965	December 17, 2023
Tokens counting for Hebrew response seems much higher API	5	1281	December 20, 2023
How does GPT-3 cost calculation for languages other than English? API	7	4341	February 20, 2023
Counting tokens for chat API calls (gpt-3.5-turbo) Documentation	5	27388	December 13, 2023
Token size in Russian lang API	3	1677	July 18, 2024

Explosion in the number of tokens / words generated

Related topics