What is difference between GPT2 and GPT3 tokenizers?

sweety.tripathi · April 24, 2023, 5:49am

Hello,
I am curious about knowing
What is exactly difference between gpt2 and gpt3 tokenizers both are BPE based so what is change in BPE to make them different. There is notebook [openai-cookbook/How_to_count_tokens_with_tiktoken.ipynb at main · openai/openai-cookbook · GitHub] Which shows different encoding for different models but actually for same example strings gpt2 and p50k_base have same token values. I tried with more examples but found both are same.
If both are same then Why I used tiktoken I simply used
tokenizer = GPT2TokenizerFast.from_pretrained(“gpt2”) for calculating tokens for GPT3 models also.

I asked from ChatGPT it gives me output gpt2-merged,How much it corrects don’t know, haven’t found anything related to this. Below screenshot

Thanks in advance.

mikoo231 · February 21, 2024, 3:52pm

Straight from the horse’s mouth

Topic		Replies	Views
Is Tokenizer.from_pretrained("gpt2") the same tokenizer used in your GPT3 and ChatGPT models? API	10	6033	March 8, 2023
What tokenizer is GPT4.1 one using? API gpt-41	0	182	April 23, 2025
Chat Token counts inconsistency between playground platform and tiktokenizer API chatgpt , token	2	657	December 27, 2024
Counting tokens for chat API calls (gpt-3.5-turbo) Documentation	5	27460	December 13, 2023
Official tokenizer has huge count difference from OpenAI tokenizer API	12	4889	October 1, 2023

What is difference between GPT2 and GPT3 tokenizers?

Related topics