I am curious about knowing
What is exactly difference between gpt2 and gpt3 tokenizers both are BPE based so what is change in BPE to make them different. There is notebook [openai-cookbook/How_to_count_tokens_with_tiktoken.ipynb at main · openai/openai-cookbook · GitHub] Which shows different encoding for different models but actually for same example strings gpt2 and p50k_base have same token values. I tried with more examples but found both are same.
If both are same then Why I used tiktoken I simply used
tokenizer = GPT2TokenizerFast.from_pretrained(“gpt2”) for calculating tokens for GPT3 models also.
I asked from ChatGPT it gives me output gpt2-merged,How much it corrects don’t know, haven’t found anything related to this. Below screenshot
Thanks in advance.