I’m trying to estimate how many tokens will be required for (summaries of) source code files, in a variety of programming languages. Are there any rules of thumb for this mapping? for natural languages it appears that 1 token ~ 4 characters is the average. Is that reasonable for source code as well…

Rules of Thumb for number of source code characters to tokens

restlessronin February 13, 2024, 5:56am 5

Thanks. I also found this older post that was interesting.

I finally picked a simple character count for the samples, and left the choice of tokenizer to the library client code.

1 Like

Topic		Replies	Views
How to do a quick estimation of token count of a text? API chatgpt , api , token	2	6844	June 26, 2025
Counting tokens for chat API calls (gpt-3.5-turbo) Documentation	5	28279	December 13, 2023
How does GPT-3 cost calculation for languages other than English? API	7	4565	February 20, 2023
Feature request: Query token counts via API Prompting	3	1658	May 24, 2022
What is the OpenAI algorithm to calculate tokens? API	35	32899	December 13, 2023