How do you make a bpe file for Tokenizer

The tokenizer can be downloaded from this page

There is a tokenizer for GPT3.5 and 4 (cl100k)
and a tokenizer for Davinci (pk50k)

The cl100k also applies to ADA002 for embedding

You will need to change the hardcoded path to the tiktoken files. This has been extracted from a larger MVC project.

2 Likes