TikToken.GetEncoding Hangs or Freezes

Using the TiktokenSharp library, the following line of C# code appears to hang or freeze:

TikToken tikToken = TikToken.GetEncoding(“cl100k_base”);

Why?

Official Tiktoken has remote network files to retrieve. They are not huge, but may still take some time. This appears to use the same. The file save location would need to be writable by the rights of program being executed. You can check docs of that unofficial package for configuring the path.

3 Likes

How long should this retrieval take, seconds, minutes? I don’t understand your sentence “This appears to use the same.” What appears to use the same what?

TiktokenSharp library needs to fetch the actual BPE tokenizers, more specifically one or more of the following:

https://openaipublic.blob.core.windows.net/encodings/p50k_base.tiktoken
https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken
https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken

Generally doesn’t take too long, seconds.

2 Likes

I downloaded the cl100k_base tokenizers and added it to the project as shown:

        mergesPath = Path.Combine(
            AppDomain.CurrentDomain.BaseDirectory,
            "Data",
            "cl100k_base.tiktoken"
        );

And the tokenizers are referenced as follows:

TikToken tikToken = TikToken.GetEncoding(mergesPath);

The following runtime error occurs at the above line of code:

System.NotImplementedException
HResult=0x80004001
Message=Unsupported model
Source=TiktokenSharp
StackTrace:
at TiktokenSharp.Services.EncodingManager.GetEncoding(String encodingName)
at TiktokenSharp.Services.EncodingManager.GetEncodingSetting(String modelOrEncodingName)
at TiktokenSharp.TikToken.GetEncoding(String encodingName)
at gfChatBot01.Form1.computeTokensBtn_Click(Object sender, EventArgs e) in C:\myVS22Apps\gfChatBot01\Form1.cs:line 82
at System.Windows.Forms.Button.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
at System.Windows.Forms.Control.WndProc(Message& m)
at System.Windows.Forms.ButtonBase.WndProc(Message& m)
at System.Windows.Forms.NativeWindow.Callback(HWND hWnd, MessageId msg, WPARAM wparam, LPARAM lparam)

The model is unsupported. Is there some more recent version of TiktokenSharp other than version 1.1.6 in Nuget?

I’m running C# in Visual Studio 2022.

Sorry about that, different problem.