TikToken.GetEncoding Hangs or Freezes

Using the TiktokenSharp library, the following line of C# code appears to hang or freeze:

TikToken tikToken = TikToken.GetEncoding(“cl100k_base”);

Why?

Official Tiktoken has remote network files to retrieve. They are not huge, but may still take some time. This appears to use the same. The file save location would need to be writable by the rights of program being executed. You can check docs of that unofficial package for configuring the path.

3 Likes

How long should this retrieval take, seconds, minutes? I don’t understand your sentence “This appears to use the same.” What appears to use the same what?

TiktokenSharp library needs to fetch the actual BPE tokenizers, more specifically one or more of the following:

https://openaipublic.blob.core.windows.net/encodings/p50k_base.tiktoken
https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken
https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken

Generally doesn’t take too long, seconds.

2 Likes