How Efficient Are Single Characters? An Output-Length and Token Compression Benchmark for GPT Models

kou.saki.1112 · April 19, 2025, 7:20am

This experiment compares the output length and token compression efficiency of GPT-4 when repeating single characters such as “い”, “ぬ”, “e”, and “z”.

The results show that frequently used characters and languages tend to be more token-efficient, leading to significantly longer outputs under the same token limit.

Additionally, a “hallucination” was observed:the model failed to include an explicitly instructed ending sentence.This suggests a mechanism behind how GPT-4 handles output near the token limit.

ChatGPT #tokenlimits #promptengineering #hallusination

kou.saki.1112 · April 19, 2025, 8:02am

Dear vb
Thanks for checking and adjusting the category so quickly!
Also, I appreciate the like — glad the topic resonated even a bit.

Topic		Replies	Views
How to do a quick estimation of token count of a text? API chatgpt , api , token	1	5795	February 19, 2024
All languages are NOT created (tokenized) equal Community token , app , comparison , statistics	9	4943	December 17, 2023
Getting access to GPT-4 Japanese model API	16	3456	May 23, 2024
Tokenization fun: long character strings of only 1 token, for prompt and code formatting and more Prompting api	8	1246	December 13, 2024
Counting tokens for chat API calls (gpt-3.5-turbo) Documentation	5	27322	December 13, 2023

How Efficient Are Single Characters? An Output-Length and Token Compression Benchmark for GPT Models

Related topics