Explosion in the number of tokens / words generated

Ailogik · August 7, 2023, 2:11pm

indeed, the number of tokens counted in usage seems to correspond to what the tokeniser indicates. it seems that the number of tokens explodes because of the French language, as there are a lot of accents. For example, the text tested here is 159 words long for 277 tokens, which makes a ratio of +74%.

This is enormous and the increase in the number of French users on my app means that my costs are skyrocketing, even though I had calibrated my rates on the basis of 25%.

Too bad, I’ll deal with it.

Thanks for your answers

Topic		Replies	Views
All languages are NOT created (tokenized) equal Community token , app , comparison , statistics	9	6655	December 17, 2023
Tokens counting for Hebrew response seems much higher API	5	1480	December 20, 2023
How does GPT-3 cost calculation for languages other than English? API	7	4761	February 20, 2023
Inquiry Regarding Token Counting in Japanese for GPT-3 API API	4	1874	September 4, 2023
I'm burning through tokens here. What can I do to minimize that? I've included the text of my instructions to my Assistant API	11	2489	November 14, 2023

Explosion in the number of tokens / words generated

Related topics