Counting Tokens and Rendering Content in HTML (Not the tags)

ozan.adiguzel · October 19, 2023, 9:58pm

I am aware that gpt-3-encoder is not supporting gpt4 cl100k-base encoding. I only used it because it was recommended by OpenAI in token counting page or their cookbook since I wasnt sure how to trust other available libs. Somehow they just updated to dqbd/tiktoken on that page. When I shared with you in previous message, it was gpt-3-encode. I think it was good enough for my usage but I will change it especially now since they are recommending this now.

Clearly I am aware of the token limit. I wasn’t trying to use that 300k character in prompt but there are many different contents with varying lengths, such as 0 character to 500k character . So what I do usually to count token and truncate the content accordingly to pass in prompt. But I changed to handle token counting in chunk to improve the performance impact and only use what we need and ignore rest of the tokens in 300k character. I was curious if there is any other way or tool.

Topic		Replies	Views
Counting tokens for chat API calls (gpt-3.5-turbo) Documentation	5	28387	December 13, 2023
Struggling to get correct token count Community gpt-4 , gpt-35-turbo , api	3	2093	December 29, 2025
Practical Tips for Dealing with Large Documents (>2048 tokens) API	6	8892	December 17, 2023
Token Counter / Splitter? Community chatgpt	2	1198	August 3, 2023
How To Handle Token Limit Error API	2	1125	June 20, 2025

Counting Tokens and Rendering Content in HTML (Not the tags)

Related topics