Hi Raymond, I don’t know if you knew this, but you can use the exact tokenizer for GPT-3 (specific to each flavor) with the open source tool called tiktoken
It depends on the model you use, but here is a chart showing wich tokenizer to use depending on the exact GPT-3 model you are using:
Encoding name | OpenAI models |
---|---|
gpt2 (or r50k_base ) |
Most GPT-3 models |
p50k_base |
Code models, text-davinci-002 , text-davinci-003
|
cl100k_base |
text-embedding-ada-002 |